Introduction

Our research focuses on performing Exploratory Data Analysis (EDA) on Google Play Store apps to uncover patterns, trends, and insights regarding app characteristics, user behavior, and installation patterns. We are trying to see how app popularity, defined as the number of installs with high reviews and ratings, is impacted by categories, last updated, app sizes, version, and other factors.

Smart Question

“What is the impact of content rating, required Android version, app category, size, and pricing on predicting app success in terms of positive ratings and high user reviews, as well as the number of installs, using data from Google Play Store apps from 2010 to 2018?”

Specific: The question clearly defines the variables (content rating, required Android version, app category, size, pricing) and the outcomes (positive ratings, high user reviews, number of installs).

Measurable: The outcomes (positive ratings, high user reviews, number of installs) are quantifiable.

Achievable: Given the availability of Google Play Store data from 2010 to 2018, the analysis is feasible.

Relevant: The question addresses a significant issue in the app development and marketing industry: predicting app success.

Time-specific: The timeframe (2010-2018) is clearly defined.

Data Preparation and Cleaning

Here, we have loaded the dataset ‘Google Play Store Apps’ stored in a CSV file using the read.csv function, which reads the CSV file into a data frame, and then we assigned it to the variable data_apps.

data_apps <- data.frame(read.csv("googleplaystore.csv")) #Loading the Dataset
str(data_apps)#Checking the structure of the data
## 'data.frame':    10841 obs. of  13 variables:
##  $ App           : chr  "Photo Editor & Candy Camera & Grid & ScrapBook" "Coloring book moana" "U Launcher Lite – FREE Live Cool Themes, Hide Apps" "Sketch - Draw & Paint" ...
##  $ Category      : chr  "ART_AND_DESIGN" "ART_AND_DESIGN" "ART_AND_DESIGN" "ART_AND_DESIGN" ...
##  $ Rating        : num  4.1 3.9 4.7 4.5 4.3 4.4 3.8 4.1 4.4 4.7 ...
##  $ Reviews       : chr  "159" "967" "87510" "215644" ...
##  $ Size          : chr  "19M" "14M" "8.7M" "25M" ...
##  $ Installs      : chr  "10,000+" "500,000+" "5,000,000+" "50,000,000+" ...
##  $ Type          : chr  "Free" "Free" "Free" "Free" ...
##  $ Price         : chr  "0" "0" "0" "0" ...
##  $ Content.Rating: chr  "Everyone" "Everyone" "Everyone" "Teen" ...
##  $ Genres        : chr  "Art & Design" "Art & Design;Pretend Play" "Art & Design" "Art & Design" ...
##  $ Last.Updated  : chr  "January 7, 2018" "January 15, 2018" "August 1, 2018" "June 8, 2018" ...
##  $ Current.Ver   : chr  "1.0.0" "2.0.0" "1.2.4" "Varies with device" ...
##  $ Android.Ver   : chr  "4.0.3 and up" "4.0.3 and up" "4.0.3 and up" "4.2 and up" ...

Display dataset

#First 5 rows of the dataset
head(data_apps)
##                                                  App       Category Rating
## 1     Photo Editor & Candy Camera & Grid & ScrapBook ART_AND_DESIGN    4.1
## 2                                Coloring book moana ART_AND_DESIGN    3.9
## 3 U Launcher Lite – FREE Live Cool Themes, Hide Apps ART_AND_DESIGN    4.7
## 4                              Sketch - Draw & Paint ART_AND_DESIGN    4.5
## 5              Pixel Draw - Number Art Coloring Book ART_AND_DESIGN    4.3
## 6                         Paper flowers instructions ART_AND_DESIGN    4.4
##   Reviews Size    Installs Type Price Content.Rating                    Genres
## 1     159  19M     10,000+ Free     0       Everyone              Art & Design
## 2     967  14M    500,000+ Free     0       Everyone Art & Design;Pretend Play
## 3   87510 8.7M  5,000,000+ Free     0       Everyone              Art & Design
## 4  215644  25M 50,000,000+ Free     0           Teen              Art & Design
## 5     967 2.8M    100,000+ Free     0       Everyone   Art & Design;Creativity
## 6     167 5.6M     50,000+ Free     0       Everyone              Art & Design
##       Last.Updated        Current.Ver  Android.Ver
## 1  January 7, 2018              1.0.0 4.0.3 and up
## 2 January 15, 2018              2.0.0 4.0.3 and up
## 3   August 1, 2018              1.2.4 4.0.3 and up
## 4     June 8, 2018 Varies with device   4.2 and up
## 5    June 20, 2018                1.1   4.4 and up
## 6   March 26, 2017                1.0   2.3 and up

Description of the Dataset Columns

  1. App: The name of the application, represented as a character string.
  2. Category: The main category of the app, such as “ART_AND_DESIGN,” represented as a character string.
  3. Rating: The average user rating of the app, recorded as a numeric value.
  4. Reviews: The total number of user reviews for the app, shown as a character string.
  5. Size: The size of the application, represented as a character string.
  6. Installs: The approximate number of installations for the app, stored as a character string.
  7. Type: Indicates whether the app is free or paid, represented as a character string.
  8. Price: The price of the app, stored as a character string. Free apps are listed as “0,” while paid apps have a dollar amount.
  9. Content.Rating: The target age group for the app, represented as a character string.
  10. Genres: The genre(s) of the app.
  11. Last.Updated: The date of the app’s last update, stored as a character string.
  12. Current.Ver: The current version of the app, represented as a character string.
  13. Android.Ver: The minimum Android version required to run the app, stored as a character string.

Apps

Checking for duplicated apps and removing

# Checking the type of the App 
#typeof(data_apps$App)
#Display all the duplicated Apps
duplicate_apps <- aggregate(App ~ ., data = data_apps, FUN = length)  
duplicate_apps <- duplicate_apps[duplicate_apps$App > 1, ] 
duplicate_apps <- duplicate_apps[order(-duplicate_apps$App), ] 

#View(duplicate_apps)
#print(duplicate_apps)

print(paste("Number of duplicated Apps:",nrow(duplicate_apps)))
## [1] "Number of duplicated Apps: 404"
#Removing Na values and duplicates
data_clean <- data_apps[!is.na(data_apps$App), ] 
data_clean <- data_clean[!duplicated(data_clean$App), ] 

#(After removing the duplicates) Unique values
unique_apps <- length(unique(data_clean$App))
print(paste("Number of unique apps after removing the duplicates:", unique_apps))
## [1] "Number of unique apps after removing the duplicates: 9660"

Duplicate App Analysis:

  • 404 apps were repeated either twice or thrice.
  • After removing duplicates, the dataset now contains 9660 unique apps.
  • Total duplicates removed: 1181 apps.

After dropping duplicate

str(data_clean$App)
##  chr [1:9660] "Photo Editor & Candy Camera & Grid & ScrapBook" ...

Price

Convertion of Price to numerical

There is ‘$’ present after each price of the App. Check and remove before conversion.

#To check if there is dollar symbol present 
#data_clean$Price[]
typeof(data_apps$Price)
## [1] "character"
# Remove dollar symbols and convert to numeric
data_clean$Price <- as.numeric(gsub("\\$", "", data_clean$Price))

All the dollar symbols are removed succesfully.

# Summary statistics for price
summary(data_clean$Price)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.000   0.000   1.099   0.000 400.000       1

From the unique_df, there is a missing value present in the Price column. Let’s handle it!

Checking for missing values in Price

missing_na <- is.na(data_clean$Price)    
missing_blank <- data_clean$Price == "" 

sum(missing_na)
## [1] 1
sum(missing_blank, na.rm = TRUE)
## [1] 0
# Remove row where Price is NA or blank
data_clean <- data_clean[!is.na(data_clean$Price) & data_clean$Price != "", ]

Have removed one row #10473 which app does not have a category nameas it is not relevant to our analysis.

#Recheck for missing values
summary(data_clean$Price)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   0.000   1.099   0.000 400.000
Missing values removed succesfully. (Price)

Type

#Checking the type of Type variable
table(data_clean$Type)
## 
## Free Paid 
## 8902  756

From the price column, we can see 8903 apps are free but it is misread somewhere in the Type column. So lets check!

#Checking for Missing values
print(paste("Missing values:",sum(is.na(data_clean$Type))))
## [1] "Missing values: 0"
data_clean[is.na(data_clean$Type), ]
##  [1] App            Category       Rating         Reviews        Size          
##  [6] Installs       Type           Price          Content.Rating Genres        
## [11] Last.Updated   Current.Ver    Android.Ver   
## <0 rows> (or 0-length row.names)
# Replace NaN or missing values in the Type column with "Free"
data_clean$Type[is.na(data_clean$Type)] <- "Free"

There is one row 9150, has a missing value for Type. As the price is 0, replaced it with “Free”.

Missing values handles succesfully. (Type)

Size

# Checking the type of the Size 
typeof(data_apps$Size)
## [1] "character"

Replacing Misiing values with the mean (Size)

# Replace "Varies with Device" in the Size column with NA
data_clean$Size[data_clean$Size == "Varies with device"] <- NA
data_clean <- data_clean[!grepl("\\+", data_clean$Size), ]
data_clean$Size <- ifelse(grepl("k", data_clean$Size),
                          as.numeric(gsub("k", "", data_clean$Size)) *
0.001,  # Convert "K" to MB
                          as.numeric(gsub("M", "", data_clean$Size)))
# Remove "M" for megabytes
# Calculate and display the mean size for each category in the 'Type' column
mean_size_by_type <- tapply(data_clean$Size, data_clean$Category,
mean, na.rm = TRUE)
print(mean_size_by_type)
##      ART_AND_DESIGN   AUTO_AND_VEHICLES              BEAUTY BOOKS_AND_REFERENCE 
##           12.370968           20.037147           13.795745           13.134701 
##            BUSINESS              COMICS       COMMUNICATION              DATING 
##           13.867194           13.794959           11.307430           15.661119 
##           EDUCATION       ENTERTAINMENT              EVENTS              FAMILY 
##           19.057101           23.043750           13.963754           27.187988 
##             FINANCE      FOOD_AND_DRINK                GAME  HEALTH_AND_FITNESS 
##           17.368127           20.494318           41.866609           20.669707 
##      HOUSE_AND_HOME  LIBRARIES_AND_DEMO           LIFESTYLE MAPS_AND_NAVIGATION 
##           15.970258           10.602883           14.844916           16.368121 
##             MEDICAL  NEWS_AND_MAGAZINES           PARENTING     PERSONALIZATION 
##           19.189399           12.470189           22.512963           11.224624 
##         PHOTOGRAPHY        PRODUCTIVITY            SHOPPING              SOCIAL 
##           15.666158           12.342505           15.491435           15.984090 
##              SPORTS               TOOLS    TRAVEL_AND_LOCAL       VIDEO_PLAYERS 
##           24.058361            8.782837           24.204410           15.792756 
##             WEATHER 
##           12.680036
# Loop through each row and replace NA values in the Size column with the mean size of the corresponding category
data_clean$Size <- ifelse(
  is.na(data_clean$Size),  # Check if Size is NA
  round(mean_size_by_type[data_clean$Category], 1),  # Replace with the mean size based on the Category
  data_clean$Size  # Keep the original size if it's not NA
)

Installs

Remove the ‘+’ sign, Remove the commas, Convert to numeric

#clean installations
clean_installs <- function(Installs) {
  Installs <- gsub("\\+", "", Installs)  
  Installs <- gsub(",", "", Installs)    
  return(as.numeric(Installs))           
}

data_clean$Installs <- sapply(data_clean$Installs, clean_installs)

nan_rows <- sapply(data_clean[, c("Size", "Installs")], function(x) any(is.nan(x)))

# Display only rows that contain NaN in either Size or Installs
data_clean[,nan_rows]
## data frame with 0 columns and 9659 rows
datatable((data_clean), options = list(scrollX = TRUE ))

Display the unique values

data_clean <- data_clean %>%
  mutate(Rating = ifelse(is.na(Rating), mean(Rating, na.rm = TRUE), Rating))

# Identify the unique values in the 'Installs' column
unique_values <- unique(data_clean$Installs)

# Display the unique values
#print(unique_values)

# Function to convert the installs to numeric
convert_to_numeric <- function(x) {
  # Remove non-numeric characters and convert to numeric
  as.numeric(gsub("[^0-9]", "", x)) * 10^(length(gregexpr(",", x)[[1]]) - 1)
}

# Sort unique values based on the custom numeric conversion
sorted_values <- unique_values[order(sapply(unique_values, convert_to_numeric))]

Rating and Reviews

# Checking the type of the Rating 
typeof(data_clean$Rating)
## [1] "double"
# Checking the type of the Reviews 
typeof(data_clean$Reviews)
## [1] "character"

Checking the format of Rating and Reviews

##  chr [1:9659] "159" "967" "87510" "215644" "967" "167" "178" "36815" ...
##  num [1:9659] 4.1 3.9 4.7 4.5 4.3 4.4 3.8 4.1 4.4 4.7 ...

As we can see the Review column is in string format which could be converted into int for more insights.

Checking the unique values for reviews and rating

unique_values <- unique(data_clean$Reviews)
unique_values1 <- unique(data_clean$Rating)
# Display the unique values
#print(unique_values)
#print(unique_values1)

Change the column reviews from Str to int

## 'data.frame':    9659 obs. of  13 variables:
##  $ App           : chr  "Photo Editor & Candy Camera & Grid & ScrapBook" "Coloring book moana" "U Launcher Lite – FREE Live Cool Themes, Hide Apps" "Sketch - Draw & Paint" ...
##  $ Category      : chr  "ART_AND_DESIGN" "ART_AND_DESIGN" "ART_AND_DESIGN" "ART_AND_DESIGN" ...
##  $ Rating        : num  4.1 3.9 4.7 4.5 4.3 4.4 3.8 4.1 4.4 4.7 ...
##  $ Reviews       : num  159 967 87510 215644 967 ...
##  $ Size          : num  19 14 8.7 25 2.8 5.6 19 29 33 3.1 ...
##  $ Installs      : num  1e+04 5e+05 5e+06 5e+07 1e+05 5e+04 5e+04 1e+06 1e+06 1e+04 ...
##  $ Type          : chr  "Free" "Free" "Free" "Free" ...
##  $ Price         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Content.Rating: chr  "Everyone" "Everyone" "Everyone" "Teen" ...
##  $ Genres        : chr  "Art & Design" "Art & Design;Pretend Play" "Art & Design" "Art & Design" ...
##  $ Last.Updated  : chr  "January 7, 2018" "January 15, 2018" "August 1, 2018" "June 8, 2018" ...
##  $ Current.Ver   : chr  "1.0.0" "2.0.0" "1.2.4" "Varies with device" ...
##  $ Android.Ver   : chr  "4.0.3 and up" "4.0.3 and up" "4.0.3 and up" "4.2 and up" ...
Table: Statistics summary.
App Category Rating Reviews Size Installs Type Price Content.Rating Genres Last.Updated Current.Ver Android.Ver
Min Length:9659 Length:9659 Min. :1.000 Min. : 0 Min. : 0.0085 Min. :0.000e+00 Length:9659 Min. : 0.000 Length:9659 Length:9659 Length:9659 Length:9659 Length:9659
Q1 Class :character Class :character 1st Qu.:4.000 1st Qu.: 25 1st Qu.: 5.3000 1st Qu.:1.000e+03 Class :character 1st Qu.: 0.000 Class :character Class :character Class :character Class :character Class :character
Median Mode :character Mode :character Median :4.200 Median : 967 Median : 13.1000 Median :1.000e+05 Mode :character Median : 0.000 Mode :character Mode :character Mode :character Mode :character Mode :character
Mean NA NA Mean :4.173 Mean : 216593 Mean : 20.1512 Mean :7.778e+06 NA Mean : 1.099 NA NA NA NA NA
Q3 NA NA 3rd Qu.:4.500 3rd Qu.: 29401 3rd Qu.: 27.0000 3rd Qu.:1.000e+06 NA 3rd Qu.: 0.000 NA NA NA NA NA
Max NA NA Max. :5.000 Max. :78158306 Max. :100.0000 Max. :1.000e+09 NA Max. :400.000 NA NA NA NA NA

There are 1463 missing values in rating.

As it could observed the Family category apps have the highest NA values. Let’s not drop them but handle them by replacing with the mean value for the category.

Checking for Outliers For rating by seeing frequency for each rating

 breaks = seq(15,20,by = 1)
frequency_table = table(data_clean$Rating)
frequency_table
## 
##                1              1.2              1.4              1.5 
##               16                1                3                3 
##              1.6              1.7              1.8              1.9 
##                4                8                8               11 
##                2              2.1              2.2              2.3 
##               12                8               14               20 
##              2.4              2.5              2.6              2.7 
##               19               20               24               23 
##              2.8              2.9                3              3.1 
##               40               45               81               69 
##              3.2              3.3              3.4              3.5 
##               63              100              126              156 
##              3.6              3.7              3.8              3.9 
##              167              224              286              359 
##                4              4.1 4.17324304538799              4.2 
##              513              621             1463              810 
##              4.3              4.4              4.5              4.6 
##              897              895              848              683 
##              4.7              4.8              4.9                5 
##              442              221               85              271

From above it can be seen all the rating are between 1 and 5.But, most of them are above 4

Replacing NA values in Rating with mean

#Replace NA in Ratings with Overall Mean
data_clean <- data_clean %>%
  mutate(Rating = ifelse(is.na(Rating), mean(Rating, na.rm = TRUE), Rating))

xkablesummary(data_clean)
Table: Statistics summary.
App Category Rating Reviews Size Installs Type Price Content.Rating Genres Last.Updated Current.Ver Android.Ver
Min Length:9659 Length:9659 Min. :1.000 Min. : 0 Min. : 0.0085 Min. :0.000e+00 Length:9659 Min. : 0.000 Length:9659 Length:9659 Length:9659 Length:9659 Length:9659
Q1 Class :character Class :character 1st Qu.:4.000 1st Qu.: 25 1st Qu.: 5.3000 1st Qu.:1.000e+03 Class :character 1st Qu.: 0.000 Class :character Class :character Class :character Class :character Class :character
Median Mode :character Mode :character Median :4.200 Median : 967 Median : 13.1000 Median :1.000e+05 Mode :character Median : 0.000 Mode :character Mode :character Mode :character Mode :character Mode :character
Mean NA NA Mean :4.173 Mean : 216593 Mean : 20.1512 Mean :7.778e+06 NA Mean : 1.099 NA NA NA NA NA
Q3 NA NA 3rd Qu.:4.500 3rd Qu.: 29401 3rd Qu.: 27.0000 3rd Qu.:1.000e+06 NA 3rd Qu.: 0.000 NA NA NA NA NA
Max NA NA Max. :5.000 Max. :78158306 Max. :100.0000 Max. :1.000e+09 NA Max. :400.000 NA NA NA NA NA

Now there are no missing values in reviews.

Category

# Checking the type of the Category 
typeof(data_apps$Category)
## [1] "character"
length(unique(data_clean$Category))
## [1] 33
length(unique(data_clean$Genres))
## [1] 118

There are 33 categories in the the data frame with 118 genres. This means that in each category, there are multiple genres. Given that, the later analyses in this project can be proceeded with Category variable.

Below is the graph for the distribution of Categories for the dataset after removing duplicates.

Current Version & Genres

Due to the inconsistent formatting of values in the Current.Ver column, this column is dropped and will be excluded from the analysis.

data_final <- data_clean %>% select(-c('Genres', 'Current.Ver'))
data_final$Category <- as.factor(data_final$Category)
data_final$Android.Ver <- as.factor(data_final$Android.Ver)

Content Rating, Last Updated

# Remove leading and trailing spaces and convert all text to a consistent format 
data_final$Content.Rating <- trimws(tolower(data_final$Content.Rating))

cr_missing <- sum(is.na(data_final$`Content Rating`))

print(paste("Number of missing values in 'Content Rating':", cr_missing))
## [1] "Number of missing values in 'Content Rating': 0"

There are no missing values for Content rating.

# Convert Last Updated to Date format
data_final$Last.Updated <- as.Date(data_final$Last.Updated, format = "%B %d, %Y")

# Verify the cleaning
print("\nSummary of Last.Updated after cleaning:")
## [1] "\nSummary of Last.Updated after cleaning:"
print(summary(data_clean$Last.Updated))
##    Length     Class      Mode 
##      9659 character character

After cleaning the Data

str(data_final)
## 'data.frame':    9659 obs. of  11 variables:
##  $ App           : chr  "Photo Editor & Candy Camera & Grid & ScrapBook" "Coloring book moana" "U Launcher Lite – FREE Live Cool Themes, Hide Apps" "Sketch - Draw & Paint" ...
##  $ Category      : Factor w/ 33 levels "ART_AND_DESIGN",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Rating        : num  4.1 3.9 4.7 4.5 4.3 4.4 3.8 4.1 4.4 4.7 ...
##  $ Reviews       : num  159 967 87510 215644 967 ...
##  $ Size          : num  19 14 8.7 25 2.8 5.6 19 29 33 3.1 ...
##  $ Installs      : num  1e+04 5e+05 5e+06 5e+07 1e+05 5e+04 5e+04 1e+06 1e+06 1e+04 ...
##  $ Type          : chr  "Free" "Free" "Free" "Free" ...
##  $ Price         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Content.Rating: chr  "everyone" "everyone" "everyone" "teen" ...
##  $ Last.Updated  : Date, format: "2018-01-07" "2018-01-15" ...
##  $ Android.Ver   : Factor w/ 34 levels "1.0 and up","1.5 and up",..: 16 16 16 19 21 9 16 19 11 16 ...

Data Exploring and Visualization

Visualization for Price Distribution

# Count Plot for the Price distribution
ggplot(data_final, aes(x=Price)) +
  geom_histogram(binwidth=2, fill="pink", color="black") +
   xlim(0, 500) + ylim(0, 500) +
  labs(title="Price Distribution", x="Price", y="Frequency") +
  theme_minimal()

The data is highly skewed as there are many zero price entries.

# Boxplot for the same
ggplot(data_final, aes(y=Price)) +
  geom_boxplot(outlier.colour = "red", outlier.shape = 16, outlier.size = 1, fill="pink", color="black") +
  labs(title="Price Boxplot", y="Price") +
  theme_minimal()

Checking outliers for Price

outlierKD2 <- function(df, var, rm = FALSE, boxplt = FALSE, histogram = TRUE, qqplt = FALSE) {
  dt <- df  # Duplicate the dataframe for potential alteration
  var_name <- eval(substitute(var), eval(dt))
  na1 <- sum(is.na(var_name))
  m1 <- mean(var_name, na.rm = TRUE)
  colTotal <- boxplt + histogram + qqplt  # Calculate the total number of charts to be displayed
  par(mfrow = c(2, max(2, colTotal)), oma = c(0, 0, 3, 0))  # Adjust layout for plots
  
  # Q-Q plot with custom title
  if (qqplt) {
    qqnorm(var_name, main="Q-Q plot without Outliers")
    qqline(var_name)
  }
  
  # Histogram with custom title
  if (histogram) { 
    hist(var_name,main = "Histogram without Outliers", xlab = NA, ylab = NA) 
  }
  
  # Box plot with custom title
  if (boxplt) { 
    boxplot(var_name, main= "Box Plot without Outliers")
  }
  
  # Identify outliers
  outlier <- boxplot.stats(var_name)$out
  mo <- mean(outlier)
  var_name <- ifelse(var_name %in% outlier, NA, var_name)
  
  # Q-Q plot without outliers
  if (qqplt) {
    qqnorm(var_name, main="Q-Q plot with Outliers")
    qqline(var_name)
  }
  
  # Histogram without outliers
  if (histogram) { 
    hist(var_name, main = "Histogram with Outliers", xlab = NA, ylab = NA) 
  }
  
  # Box plot without outliers
  if (boxplt) { 
    boxplot(var_name, main = "Boxplot with Outliers") 
  }
  
  # Add the title for the overall plot section if any plots are displayed
  if (colTotal > 0) {
    title("Outlier Check", outer = TRUE)
    na2 <- sum(is.na(var_name))
    cat("Outliers identified:", na2 - na1, "\n")
    cat("Proportion (%) of outliers:", round((na2 - na1) / sum(!is.na(var_name)) * 100, 1), "\n")
    cat("Mean of the outliers:", round(mo, 2), "\n")
    cat("Mean without removing outliers:", round(m1, 2), "\n")
    cat("Mean if we remove outliers:", round(mean(var_name, na.rm = TRUE), 2), "\n")
  }
}
#outlier function is defined in previous chunck of code.
outlier_check_price = outlierKD2(data_final, Price, rm = FALSE, boxplt = TRUE, qqplt = TRUE)

## Outliers identified: 756 
## Proportion (%) of outliers: 8.5 
## Mean of the outliers: 14.05 
## Mean without removing outliers: 1.1 
## Mean if we remove outliers: 0

The price values in the dataset, including both typical and extreme values, are valid observations for our analysis. As such, removing these outliers may not be beneficial for our study.

#To check the value ranges
table(data_final$Price)
## 
##      0   0.99      1   1.04    1.2   1.26   1.29   1.49    1.5   1.59   1.61 
##   8903    145      3      1      1      1      1     46      1      1      1 
##    1.7   1.75   1.76   1.96   1.97   1.99      2   2.49    2.5   2.56   2.59 
##      2      1      1      1      1     73      3     25      1      1      1 
##    2.6    2.9   2.95   2.99   3.02   3.04   3.08   3.28   3.49   3.61   3.88 
##      1      1      1    124      1      1      1      1      7      1      1 
##    3.9   3.95   3.99   4.29   4.49   4.59    4.6   4.77    4.8   4.84   4.85 
##      1      1     57      1      9      1      1      1      1      1      1 
##   4.99      5   5.49   5.99   6.49   6.99   7.49   7.99   8.49   8.99      9 
##     70      1      5     26      5     11      2      7      2      5      1 
##   9.99     10  10.99  11.99  12.99  13.99     14  14.99  15.46  15.99  16.99 
##     19      2      2      3      4      2      1      9      1      1      2 
##  17.99  18.99   19.4   19.9  19.99  24.99  25.99  28.99  29.99  30.99  33.99 
##      2      1      1      1      5      3      1      1      5      1      1 
##  37.99  39.99  46.99  74.99  79.99  89.99 109.99 154.99    200 299.99 379.99 
##      1      2      1      1      1      1      1      1      1      1      1 
## 389.99 394.99 399.99    400 
##      1      1     12      1

As aldready mentioned, there are 8903 free apps (More apps with price as 0).

Visualization for Type Distribution

# Bar Plot for the Type Distribution
ggplot(data_final, aes(x = Type)) +
  geom_bar(fill = "pink", color = "black") +
  labs(title = "Distribution of App Types (Free vs Paid)", x = "Type", y = "Count") +
  theme_minimal()

As it is clear, there are more free apps.

#Display statistics for the Price of apps grouped by their Type
data_final$Type <- as.factor(data_final$Type)


summary_by_type <- data.frame(
  Type = levels(data_final$Type),
  Min_Price = tapply(data_clean$Price, data_clean$Type, min, na.rm = TRUE),
  Max_Price = tapply(data_clean$Price, data_clean$Type, max, na.rm = TRUE),
  Mean_Price = tapply(data_clean$Price, data_clean$Type, mean, na.rm = TRUE),
  Median_Price = tapply(data_clean$Price, data_clean$Type, median, na.rm = TRUE)
)


print(summary_by_type)
##      Type Min_Price Max_Price Mean_Price Median_Price
## Free Free      0.00         0    0.00000         0.00
## NaN   NaN      0.00         0    0.00000         0.00
## Paid Paid      0.99       400   14.04515         2.99
#Scatter plot for price distribution by app type
ggplot(data_final, aes(x = Type, y = Price, fill = Type)) +
  geom_boxplot() +
  labs(title = "Price Distribution by App Type", 
       x = "App Type", 
       y = "Price ($)") +
  theme_minimal()

Histogram for price distribution by App Type

ggplot(data_final, aes(x = Price, fill = Type)) +
  geom_histogram(binwidth = 60, alpha = 0.7, position = "identity") +
  facet_wrap(~ Type) +
  labs(title = "Price Distribution by App Type", 
       x = "Price ($)", 
       y = "Count") +
  theme_minimal()

Upon analyzing the price distribution across different app types, we found that some values in the Type column do not accurately represent the app prices (from above plot). Since we can fully rely on the Price values for our analysis, the Type column is seemed unnecessary.

Hence, Removing the Type column…

Dropping the Type column

#Using subset function
data_final <- subset(data_final, select = -Type)
#After removing the Type column and duplicated values
str(data_final)
## 'data.frame':    9659 obs. of  10 variables:
##  $ App           : chr  "Photo Editor & Candy Camera & Grid & ScrapBook" "Coloring book moana" "U Launcher Lite – FREE Live Cool Themes, Hide Apps" "Sketch - Draw & Paint" ...
##  $ Category      : Factor w/ 33 levels "ART_AND_DESIGN",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Rating        : num  4.1 3.9 4.7 4.5 4.3 4.4 3.8 4.1 4.4 4.7 ...
##  $ Reviews       : num  159 967 87510 215644 967 ...
##  $ Size          : num  19 14 8.7 25 2.8 5.6 19 29 33 3.1 ...
##  $ Installs      : num  1e+04 5e+05 5e+06 5e+07 1e+05 5e+04 5e+04 1e+06 1e+06 1e+04 ...
##  $ Price         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Content.Rating: chr  "everyone" "everyone" "everyone" "teen" ...
##  $ Last.Updated  : Date, format: "2018-01-07" "2018-01-15" ...
##  $ Android.Ver   : Factor w/ 34 levels "1.0 and up","1.5 and up",..: 16 16 16 19 21 9 16 19 11 16 ...
The Type column is successfully removed.

Let’s do bivariate analysis on price and other variables starting from here.

Visualization for Price vs Installs

#Plotting a scatter plot between Price and installs
ggplot(data_final, aes(x=Price, y=log(data_clean$Installs))) +
  geom_point(color = 'red', size = 1, alpha = 0.5) + 
  geom_smooth(method = 'lm', color = 'blue', se = FALSE) +
  labs(title = "Price vs Installs", x = "Price (USD)", y = "Number of Installs") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

From the scatter plot, we can see that there are more number of installations with price value 0.

# Categorize the apps as "Free" or "Paid" based on Price
Price_Category <- ifelse(data_final$Price == 0, "Free", "Paid")
str(data_final$Price)
##  num [1:9659] 0 0 0 0 0 0 0 0 0 0 ...
str(Price_Category)
##  chr [1:9659] "Free" "Free" "Free" "Free" "Free" "Free" "Free" "Free" ...
#str(log(data_clean$Installs))

For a better visualization, we are categorizing price values 0 as free apps and plotting abox plot.

# Box plot of Price Category vs. log-transformed Installs
ggplot(data_final, aes(x = Price_Category, y = log(data_clean$Installs))) +
  geom_boxplot(fill = "lightblue", color = "darkblue", alpha = 0.6) +
  labs(title = "Price Categories vs. Log-Transformed Installs", 
       x = "Price Category", 
       y = "Log(Installs)") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  

“Free” apps tend to have more installs than “Paid” apps. The difference between the means on the log scale is estimated to be between 3.47 and 3.97.

# Categorize the apps as "Free" or "Paid" based on Price
Price_Category <- ifelse(data_final$Price == 0, "Free", "Paid")
str(data_final$Price)
##  num [1:9659] 0 0 0 0 0 0 0 0 0 0 ...
str(Price_Category)
##  chr [1:9659] "Free" "Free" "Free" "Free" "Free" "Free" "Free" "Free" ...
#str(data_final$log(data_clean$Installs))

table(Price_Category)
## Price_Category
## Free Paid 
## 8903  756
# Add Price_Category to data_final
data_duplicate <- data_final
data_duplicate$Price_Category <- ifelse(data_final$Price == 0, "Free", "Paid")

# Create a summarized table for Price_Category and log_Installs
summary_table <- data_duplicate %>%
  group_by(Price_Category) %>%
  summarise(Average_Log_Installs = mean(log(data_clean$Installs), na.rm = TRUE),
            Count = n())

# View the summarized table
kable(summary_table, format = "html", col.names = c("Price Category", "Mean Log(Installs)", "App Count")) %>%
  kable_styling(full_width = FALSE, position = "center") 
Price Category Mean Log(Installs) App Count
Free -Inf 8903
Paid -Inf 756

Visualization for Price vs Reviews & Rating

# Plot Price vs. Reviews
ggplot(data_final, aes(x=Price, y=Reviews)) +
  geom_point(color = 'blue') +
  geom_smooth(method = 'lm', color = 'red', se = FALSE) +
  labs(title = "Price vs Reviews", x = "Price (USD)", y = "Number of Reviews") +
  theme_minimal() + 
  theme(
    panel.background = element_rect(fill = "white"),  # Set panel background to white
    plot.background = element_rect(fill = "white"),   # Set plot background to white
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

# Plot Price vs. Rating
ggplot(data_final, aes(x=Price, y=Rating)) +
  geom_point(color = 'green') +
  geom_smooth(method = 'lm', color = 'red', se = FALSE) +
  labs(title = "Price vs Rating", x = "Price (USD)", y = "Rating") +
  theme_minimal() + 
  theme(
    panel.background = element_rect(fill = "white"),  # Set panel background to white
    plot.background = element_rect(fill = "white"),   # Set plot background to white
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

Price vs Reviews with installation: Cheaper products tend to have more reviews, indicating higher popularity or more frequent purchases. In contrast, expensive products tend to have fewer reviews, possibly because fewer people buy higher-priced items.

Price vs Ratings with installation: Price does not strongly affect the average rating, but there is a slight trend where lower-priced products have more variation in ratings, while higher-priced products tend to receive more consistent ratings around 4. May be higher price apps are meeting the customer expectations.

Visualization for Price vs Reviews vs Installs

# Scatter plot of Price vs. Ratings with log_Installs as  color
ggplot(data_final, aes(x = Price, y = Rating,color = log(data_clean$Installs))) +
  geom_point(alpha = 0.6) +
  scale_color_gradient(low = "blue", high = "red") +  
  labs(title = "Price vs. Ratings with Installs as Color by Price", 
       x = "Price", 
       y = "Rating", 
       color = "log(Installs)") +
  theme_minimal()

# Scatter plot of Price vs. Reviews with log_Installs as color
ggplot(data_final, aes(x = Price, y = Reviews,color = log(data_clean$Installs))) +
  geom_point(alpha = 0.6) +
  scale_color_gradient(low = "darkgreen", high = "yellow") +  
  labs(title = "Price vs. reviewss with Installs as Color by Price", 
       x = "Price", 
       y = "Reviews", 
       color = "log(Installs)") +
  theme_minimal()

Concluding: Apps with lower prices, have more ratings and installs while apps priced higher tend to have fewer installs and more scattered ratings. Similarly, for reviews.

Visualization for Price vs Size

# Plot Price vs Size
ggplot(data_final, aes(x=Price, y=Size)) +
  geom_point(color = 'red') + 
  geom_smooth(method = 'lm', color = 'blue', se = FALSE) +
  labs(title = "Price vs Size", x = "Price (USD)", y = "App Size (MB)") +
  theme_minimal() 

#### Visualization for paid apps vs category

# Filter only paid apps
paid_apps <- data_duplicate %>% filter(Price_Category == "Paid")

ggplot(paid_apps, aes(x = Category)) +
  geom_bar(fill = "skyblue", color = "black") +
  labs(title = "Distribution of Categories for Paid Apps",
       x = "App Category",
       y = "Count of Paid Apps") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Few Categories have a high count of paid apps, such as “Family,” “Health and Fitness,” and “Personalization” which tells us that users may be more willing to pay for apps in these categories, possibly due to the quality or utility they offer.

Categories like “Dating,” “Weather,” and “Education” have very few paid apps, indicating lower availability or demand for paid versions in these areas.

Visulization for Distribution of Installs

# Create a copy of data_final for factor-level modifications
data_clean1_factor <- data_final  

# Define breakpoints and labels for the Installs factor
breaks <- c(0, 500, 1000, 2500, 5000, 10000, 25000, 50000, 100000, 300000, 1000000, 5000000,10000000,Inf)
labels <- c("0+","500+", "1K+", "2.5K+", "5K+", "10K+", 
            "25K+", "50K+", "100K+", "300K+", "1M+", "5M+","Above 10M+" )

# Convert Installs into a factor with the defined levels
data_clean1_factor$Installs <- cut(data_final$Installs, breaks = breaks, right = FALSE, labels = labels)

# Create a bar plot with the ordered factor
ggplot(data_clean1_factor, aes(x = Installs)) +
  geom_bar() +
  xlab("Installs") +
  ylab("Count") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1), 
        plot.title = element_text(hjust = 0.5))  +  
  ggtitle("Distribution of App Installs")

This plot shows a common trend where many apps have limited user engagement, while a few high-performing apps achieve very high install counts, indicating a polarized distribution of app popularity.

Visualization for Rating Distribution

boxplot(data_final$Rating,ylab = "Rating", xlab = "Count",col = "Blue")

hist(data_clean$Rating, main="Histogram of Apps Rating after cleaning", xlab="Rating (count)", col = 'blue', breaks = 100 )

qqnorm(data_clean$Rating)
qqline(data_clean$Rating, col = "red")

Here, it could be seen the plots are much clearer but still skewed due to other outliers from 1-3 rating but as these may be the reason from which we could find why the apps are low rated hencecannot be removed from our dataset.

Visualization for Reviews

boxplot(data_final$Reviews,ylab = "Reviews", xlab = "Count",col = 'Blue')

hist(data_final$Reviews, main="Histogram of Apps Reviews", xlab="Reviews (count)", col = 'blue', breaks = 100 )

ggplot(data_final, aes(x = log(Reviews))) +
  geom_histogram(binwidth = 0.1, fill = "blue", color = "black") +
  labs(title = "Log-Transformed Histogram of Ratings", x = "Log(Rating)", y = "Count")

qqnorm(data_final$Reviews)
qqline(data_final$Reviews, col = "red")

Similar to the case of ratings the plots are skewed due to the outliers. Hence, we can use the log plot of reviews for the visualisation which is normalised version of Reviews. As they are skewed, they donot follow normal distribution.

Review frequency table

xkablesummary(data_final)
Table: Statistics summary.
App Category Rating Reviews Size Installs Price Content.Rating Last.Updated Android.Ver
Min Length:9659 FAMILY :1832 Min. :1.000 Min. : 0 Min. : 0.0085 Min. :0.000e+00 Min. : 0.000 Length:9659 Min. :2010-05-21 4.1 and up :2202
Q1 Class :character GAME : 959 1st Qu.:4.000 1st Qu.: 25 1st Qu.: 5.3000 1st Qu.:1.000e+03 1st Qu.: 0.000 Class :character 1st Qu.:2017-08-05 4.0.3 and up :1395
Median Mode :character TOOLS : 827 Median :4.200 Median : 967 Median : 13.1000 Median :1.000e+05 Median : 0.000 Mode :character Median :2018-05-04 4.0 and up :1285
Mean NA BUSINESS : 420 Mean :4.173 Mean : 216593 Mean : 20.1512 Mean :7.778e+06 Mean : 1.099 NA Mean :2017-10-30 Varies with device: 990
Q3 NA MEDICAL : 395 3rd Qu.:4.500 3rd Qu.: 29401 3rd Qu.: 27.0000 3rd Qu.:1.000e+06 3rd Qu.: 0.000 NA 3rd Qu.:2018-07-17 4.4 and up : 818
Max NA PERSONALIZATION: 376 Max. :5.000 Max. :78158306 Max. :100.0000 Max. :1.000e+09 Max. :400.000 NA Max. :2018-08-08 2.3 and up : 616
NA NA (Other) :4850 NA NA NA NA NA NA NA (Other) :2353
outlierKD2(data_final, Reviews)
## Outliers identified: 1656 
## Proportion (%) of outliers: 20.7 
## Mean of the outliers: 1228141 
## Mean without removing outliers: 216592.6 
## Mean if we remove outliers: 7280.61

To check which are outliers lets make sections of data that is create bins to check which bins have maximum data, this would help us see how reviews are distributed.

Binned reviews

Binning into equal count in each bin to check averge rating for each bin

# Define the new custom breaks for bins
# Ensure there are no NA values


# Define new breaks for more even intervals
breaks <- c(0, 100, 500, 1000, 2500, 5000, 10000, 25000,50000,100000, 300000,1000000,Inf)

# Create a categorical variable based on the new breaks
Review_Category <- cut(data_final$Reviews, breaks = breaks, right = FALSE, 
                   labels = c("0+","100+", "500+", "1K+",
                              "2.5K+", "5K+", "10K+","25K+",
                              "50K+", "100K+","300K+","1M+"))

# Count the number of values in each bin
bin_counts <- as.data.frame(table(Review_Category))

# Rename the columns for clarity
colnames(bin_counts) <- c("Review_Category", "Count")

# Print the counts
print(bin_counts)
##    Review_Category Count
## 1               0+  3327
## 2             100+  1065
## 3             500+   462
## 4              1K+   586
## 5            2.5K+   475
## 6              5K+   474
## 7             10K+   719
## 8             25K+   606
## 9             50K+   498
## 10           100K+   647
## 11           300K+   451
## 12             1M+   349
# Create a line plot of the binned counts
ggplot(bin_counts, aes(x = Review_Category, y = Count, group = 1)) +
  geom_line(color = "blue", size = 1) +
  geom_point(color = "blue", size = 3) +
  labs(title = "Count of Reviews by Review Category", 
       x = "Review Category", 
       y = "Count of Reviews") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels for readability

Hence, high reviews can be observed in less apps and less reviews can be observed in more apps which is expected.

Boxplots for Rating vs Reviews

boxplot( data_final$Rating~ Review_Category, data = data_clean, 
        main = "Boxplot of Review Counts by Review Category", 
        xlab = "Review Category", 
        ylab = "Review Rating",
        las = 2,        # Rotate the x-axis labels for readability
        col = "lightblue")  # Optional: Set color for the boxplots

In this we could observe that, as reviews increase the median of rating increased and the values clustered around higher ratings which could show that high reviews, could mean a better rated app.

Mean value of Ratings for each Review bins

# Calculate the mean Rating for each Review_Category
mean_ratings <- tapply(data_final$Rating, Review_Category, mean, na.rm = TRUE)

# Convert the result to a data frame for better readability
mean_ratings_df <- data.frame(Review_Category = names(mean_ratings), Mean_Rating = as.numeric(mean_ratings))

# Print the mean ratings for each review bin
print(mean_ratings_df)
##    Review_Category Mean_Rating
## 1               0+    4.126221
## 2             100+    4.029538
## 3             500+    4.063188
## 4              1K+    4.107030
## 5            2.5K+    4.129572
## 6              5K+    4.191139
## 7             10K+    4.221836
## 8             25K+    4.231848
## 9             50K+    4.293775
## 10           100K+    4.329830
## 11           300K+    4.375610
## 12             1M+    4.426361
# Define correct order of Review_Category as a factor
mean_ratings_df$Review_Category <- factor(mean_ratings_df$Review_Category, 
                                          levels = c("0+","100+", "500+", "1K+",
                                                     "2.5K+", "5K+", "10K+","25K+",
                                                     "50K+", "100K+", "300K+", "1M+"))

# Plot the mean ratings for each review bin in the correct order
ggplot(mean_ratings_df, aes(x = Review_Category, y = Mean_Rating)) +
  geom_bar(stat = "identity", fill = "steelblue") +  # Use bar plot
  labs(title = "Mean Rating by Review Category",
       x = "Review Category",
       y = "Mean Rating") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels for readability

As we can see, the mean rating increases as the reviews increase.

Histogram for Reviews and Rating

# Create a new data frame for plotting
plot_data <- data.frame(Rating = data_final$Rating, Review_Category = Review_Category)

# Create a histogram of Ratings, faceted by Review_Category
ggplot(plot_data, aes(x = Rating)) +
  geom_histogram(bins = 30, fill = "blue", alpha = 0.7) +
  facet_wrap(~ Review_Category, labeller = label_wrap_gen()) +  # Facet by Review_Category
  theme_minimal() +
  labs(title = "Histograms of Ratings by Review Category", x = "Rating", y = "Frequency")

This is another representation of ratings vs reviews.

Visualization for Installs vs Size

ggplot(data_clean, aes(x = Size, y = log(Installs))) +
  geom_hex(bins = 30) +
  scale_fill_viridis_c() + # Adds color gradient
  labs(title = "Plot of App Size vs. Installs (Log Scale)",
       x = "Size (MB)",
       y = "Installs (Log Scale)") +
  theme_minimal()

This analysis shows that apps with medium file sizes (around 20-50 MB) tend to be more popular and get downloaded more often. On the other hand, larger apps (over 75 MB) don’t get downloaded as much, possibly because they take up more space on devices or are slower to download.

Visualization for Reviews vs Installs

# Scatter plot for Installs vs Reviews
ggplot(data_clean1_factor, aes(x = Review_Category, y = Installs)) +
  geom_point(color = "blue", alpha = 0.5) +
  geom_smooth(method = "lm", color = "red", se = FALSE) +  # Add a regression line
  labs(title = "Scatter Plot of Installs vs Reviews", 
       x = "Number of Reviews", 
       y = "Number of Installs") +
  theme_minimal()

This scatter plot shows that as apps get more downloads, they tend to get more reviews from users. Most apps fall in the lower range of downloads (like 0 to 50K installs), meaning they get a moderate number of reviews. Popular apps with over a million downloads often have thousands of reviews, showing strong user engagement. However, very few apps reach extremely high review counts (like 100K+ reviews), indicating that only a few apps become extremely popular. In short, the more installs an app has, the more likely it is to gather user feedback in the form of reviews.

Visualisation of Mean for different Install Categories

# Calculate the mean Rating for each Review_Category
mean_ratings <- tapply(data_final$Rating, data_clean1_factor$Installs, mean, na.rm = TRUE)

# Convert the result to a data frame for better readability
mean_ratings_df <- data.frame(Installs = names(mean_ratings), Mean_Rating = as.numeric(mean_ratings))

# Print the mean ratings for each review bin
print(mean_ratings_df)
##      Installs Mean_Rating
## 1          0+    4.247797
## 2        500+    4.176062
## 3         1K+    4.086812
## 4       2.5K+          NA
## 5         5K+    4.035362
## 6        10K+    4.041438
## 7        25K+          NA
## 8        50K+    4.048356
## 9       100K+    4.117373
## 10      300K+    4.168462
## 11        1M+    4.216335
## 12        5M+    4.227677
## 13 Above 10M+    4.316338
mean_ratings_df$Installs = factor(mean_ratings_df$Installs, levels = c(0,1,5,10,50,100,500,1000,5000,10000,50000,100000,500000,1000000,5000000,10000000,50000000,100000000,500000000,1000000000))

# Plot the mean ratings for each review bin in the correct order
ggplot(mean_ratings_df, aes(x = Installs, y = Mean_Rating)) +
  geom_bar(stat = "identity", fill = "steelblue") +  # Use bar plot
  labs(title = "Mean Rating by Install Category",
       x = "Installs Category",
       y = "Mean Rating") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels for readability

Observing the flucuation of Rating for different Installs it could be seen that there is no constant increase or deacrease trend seen for Installs and rating, which could be expected as more Rating doesnot necessarily mean more Installs. But high Installs and high Rating could be seen as good app.

Visualization for Rating vs Installs

# Scatter plot of log-transformed Installs vs. Rating
ggplot(data_clean, aes(x = Size, y = log(Installs))) +
  geom_hex(bins = 30) +
  scale_fill_viridis_c() + # Adds color gradient
  labs(title = "Plot of App Size vs. Installs (Log Scale)",
       x = "Size (MB)",
       y = "Installs (Log Scale)") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) # Centers the title

The scatter plot shows a weak positive correlation between app installs and ratings, as indicated by the nearly flat red trend line. Data points are clustered at specific rating levels, suggesting many apps share similar ratings but vary significantly in install counts. The x-axis range, extending to 20, is unusual for typical rating scales (e.g., 1-5), possibly indicating data irregularities or transformations. The y-axis, log-transformed, reveals a wide distribution in install counts. Overall, ratings seem to have minimal impact on installs, and the data’s clustering suggests further examination of the rating scale may be necessary.

Visualization for Rating vs Installs by Category

This scatter plot illustrates the relationship between log-transformed installs and ratings across app categories. Each category has distinct symbols and colors, showing varying distributions in ratings and installs. Higher-rated apps (around 4-5) are more densely populated across several categories, while lower-rated apps are sparse. Some categories, such as “LIFESTYLE” and “FINANCE,” have a broad range of installs and ratings, while others are tightly clustered. The spread of installs varies widely, especially among apps with similar ratings, suggesting that category influences install counts significantly. Overall, higher ratings correlate with greater install variability across categories.

Visualization for Category Distribution

category_counts <- table(data_final$Category)

# Convert to data frame for plotting
category_counts_df <- as.data.frame(category_counts)
colnames(category_counts_df) <- c("Category", "Frequency") 

ggplot(category_counts_df, aes(x = reorder(Category, Frequency), y = Frequency)) + 
  geom_bar(stat = "identity", fill = "#1f3374") +
  geom_text(aes(label = Frequency), vjust = 0.5, hjust=1, size=2.5, color='#f8c220') +
  coord_flip() +  
  labs(title = "Distribution of Categories", x = "Category", y = "Frequency") +
  theme_minimal() +
   theme(
    plot.background = element_rect(fill = "#efefef", color = NA),
    panel.background = element_rect(fill = "#efefef", color = NA),
    axis.text.y = element_text(size = 5.5)
  )

AS it can be seen from the graph above, most of the apps in the dataset belong to the Family and Game, tools category, and Beauty,comics have the least number of apps.

Visualization for Category vs. Installs

Below is a boxplot show the distribution of number of installs for each category order by mean from highest to lowest.

ggplot(data_clean, aes(x = reorder(Category, log(data_final$Installs),  FUN = mean), y = log(data_clean$Installs))) +
  geom_boxplot(outlier.color = "#f05555", outlier.shape = 1, color='#1f3374', fill="#efefef") +  # Red outliers for emphasis
  coord_flip() +  # Flip for better readability
  scale_y_log10() +
  theme_minimal() +
  labs(title = "Distribution of Installs by Category",
       x = "Category",
       y = "Number of Installs (Log Scale)") +
    theme(
    plot.background = element_rect(fill = "#efefef", color = NA),
    panel.background = element_rect(fill = "#efefef", color = NA),
    axis.text.y = element_text(size = 5.5)
  )

It can be seen from the graph that, on average, Entertainment apps receive the highest number of installations, followed by Education, Game, Photography, and Weather apps. In contrast, Art & Design apps have the fewest installations.

Visualization for Category vs. App Size

#convert_size <- function(size) {
#    size <- gsub(",", "", size)  # Remove commas
#    size <- tolower(size)  # Make lowercase for consistency
      
      # Handle "varies with device" by assigning NA
#    if (size == "varies with device") return(NA)
      
      # Convert "k" to MB (1 MB = 1000 KB)
 #   if (grepl("k", size)) return(as.numeric(gsub("k", "", size)) / 1000)
      
      # Convert "M" to numeric MB
  #  if (grepl("m", size)) return(as.numeric(gsub("m", "", size)))
      
      # Handle numeric values directly (e.g., "1000+")
   # if (grepl("\\d+\\+", size)) return(as.numeric(gsub("\\+", "", size)) / 1000)
      
      # Default case: return as numeric if possible
    #return(as.numeric(size))
    #}

Below is the figure showing the distribution of app sizes in each category.

#df_clean <- data_clean %>%
 # mutate(Size = sapply(Size, convert_size)) %>%
#  filter(!is.na(Size))

# Plot the histogram with faceting by category
ggplot(data_clean, aes(x = Size)) +
  geom_histogram(binwidth = 5, fill = "#304ba6", color = "black") +
  facet_wrap(~ Category, scales = "free_y") +
  theme_minimal() +
  labs(
    title = "Distribution of App Sizes by Category",
    x = "Size (MB)",
    y = "Count"
  ) +
  theme(
    strip.text = element_text(size = 5),
    axis.text.x = element_text(size = 7, angle = 45, hjust = 1)
  )

ggplot(data_clean, aes(x = reorder(Category, Size, FUN = median), y = Size)) + 
  geom_boxplot(outlier.color = "#f05555", outlier.shape = 1) + 
  coord_flip() + 
  theme_minimal() + 
  labs(
    title = "Boxplot of App Sizes by Category (Ordered by Median)", 
    x = "Category", 
    y = "Size (MB)"
  ) + 
  theme(
    strip.text = element_text(size = 8), 
    axis.text.x = element_text(size = 7, angle = 45, hjust = 1)
  )

As it can be seen from the two figures above, most categories exhibit right-skewed app sizes, with the majority being under 50MB. However, the Game category stands out with a significantly larger median app size compared to other categories.

Visualization for Category vs. Reviews

Below is the graph displaying the distribution of reviews left by users for each category.

df_aggregated <- data_final %>% 
  group_by(Category) %>% 
  summarise(Total_Reviews = sum(Reviews, na.rm = TRUE))

#df_aggregated
# Plot the total reviews by category using a bar chart
ggplot(df_aggregated, aes(x = reorder(Category, -Total_Reviews), y = log10(Total_Reviews))) + 
  geom_bar(stat = "identity", fill = "#1f3374") + 
  labs(
    title = "Log-Scaled Total Reviews by Category", 
    x = "Category", 
    y = "Log10(Total Number of Reviews)"
  ) + 
  theme_minimal() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

AS it can be seen that game apps have most reviews while events apps have the least reviews.

Histogram for Category vs. Rating

Below is the figure demonstrating the distribution of number of rating for each category.

ggplot(data_final, aes(x = Rating)) + 
  geom_histogram(binwidth = 0.5, fill = "#1f3374", color='#efefef') + 
  facet_wrap(~ Category, scales = "free_y") +  # Facet by Category with independent y-axis
  scale_x_continuous(limits = c(1, 5), breaks = seq(1, 5, by = 0.5)) +  # Restrict x-axis to 1-5
  theme_minimal() + 
  labs(
    title = "Distribution of Ratings by Category", 
    x = "Rating", 
    y = "Count"
  ) + 
  theme(
    strip.text = element_text(size = 5),  # Adjust facet label size
    axis.text.x = element_text(size = 5, angle = 45, hjust = 1),  # Rotate x-axis labels
    plot.title = element_text(hjust = 0.5)  # Center the plot title
  )

As illustrated in the graph above, all categories have app ratings that range between 4.0 and 5.0.

Visualization for Android Version

Below is the figure showing the distribution of Android versions.

extract_version <- function(version) {
  version <- tolower(version)  # Make lowercase for consistency
  
  # Handle "Varies with device" and "NaN"
  if (version == "varies with device" || version == "nan") return(NA)
  
  # Extract the first version in case of ranges (e.g., "4.1 - 7.1.1" -> "4.1")
  first_version <- strsplit(version, "[- ]")[[1]][1]
  
  # Remove "and up" if present (e.g., "4.0 and up" -> "4.0")
  first_version <- gsub("and up", "", first_version)
  
  return(as.numeric(first_version))  # Convert to numeric
}
df_clean <- data_final %>%
  mutate(Android_Ver = sapply(Android.Ver, extract_version)) %>%
  filter(!is.na(Android_Ver))  # Remove rows with NA in Android_Ver

android_installs <- data_final %>% 
  group_by(Android.Ver) %>% 
  summarize(Total_Installs = sum(Installs, na.rm = TRUE))
ggplot(df_clean, aes(x = Android_Ver)) + 
  geom_histogram(binwidth = 0.5, fill = "#1f3374", color='#efefef') + 
  scale_x_continuous(breaks = seq(1, 8, by = 1.0)) +  # Set x-axis ticks from 1.0 to 8.0
  theme_minimal() + 
  labs(
    title = "Distribution of Android Versions", 
    x = "Android Version", 
    y = "Count"
  ) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

As it can be seen that, the minimum required Android Version for most apps is 4.0 and up.

extract_version <- function(version) {
  version <- tolower(version)  # Make lowercase for consistency
  
  # Handle "Varies with device" and "NaN"
  if (version == "varies with device" || version == "nan") return(NA)
  
  # Extract the first version in case of ranges (e.g., "4.1 - 7.1.1" -> "4.1")
  first_version <- strsplit(version, "[- ]")[[1]][1]
  
  # Remove "and up" if present (e.g., "4.0 and up" -> "4.0")
  first_version <- gsub("and up", "", first_version)
  
  return(as.numeric(first_version))  # Convert to numeric
}

Bar plot for Android Version vs. Installs

Below is the graph showing the number of installs for each minimum required Android Version.

ggplot(data_final, aes(x = reorder(Android.Ver, Installs), y = Installs)) + 
  geom_bar(stat = "identity", fill = "#1f3374") + 
  coord_flip() +  # Flip coordinates for better readability
  scale_y_continuous(labels = scales::comma) +  # Format y-axis with commas
  theme_minimal() + 
  labs(
    title = "Total Installs by Android Version", 
    x = "Android Version", 
    y = "Total Installs"
  ) + 
  theme(
    axis.text.y = element_text(size = 8),  # Adjust y-axis text size
    plot.title = element_text(hjust = 0.5)  # Center the plot title
  )

It can be seen that the highest number of installation is when there is different requirements of the versions for the app to run.

Boxplot for Android Version vs. Reviews

Below is the distribution of reviews for each minimum required Android Version.

df_clean <- data_final %>% 
  filter(!is.na(Android.Ver) & !is.na(Reviews)) %>% 
  mutate(Scaled_Reviews = log10(Reviews + 1))
ggplot(df_clean, aes(x = reorder(Android.Ver, Scaled_Reviews, FUN = median), y = Scaled_Reviews)) + 
  geom_boxplot(outlier.color = "#f05555", outlier.shape = 1) +  # Boxplot with red outliers
  coord_flip() +  # Flip coordinates for better readability
  theme_minimal() + 
  labs(
    title = "Distribution of Scaled Reviews by Android Version", 
    x = "Android Version", 
    y = "Scaled Reviews (Log10)"
  ) + 
  theme(
    axis.text.y = element_text(size = 8),  # Adjust y-axis text size
    plot.title = element_text(hjust = 0.5)  # Center the plot title
  )

It can be seen that the version from 4.1 to 7.1.1 have the highest number of reviews, whiel version from 5.0 to 7.1.1 have the least number of reviews.

Histogram for Android Version vs. Rating

Below is the plot showing the number of ratings for each Android Version.

ggplot(df_clean, aes(x = Rating, fill = Android.Ver)) + 
  geom_histogram(binwidth = 0.5, position = "stack", color = "black", alpha = 0.7) + 
  scale_x_continuous(breaks = seq(1, 5, by = 0.5)) +  # Set x-axis breaks
  theme_minimal() + 
  labs(
    title = "Histogram of Ratings by Android Version", 
    x = "Rating", 
    y = "Count"
  ) + 
  theme(
    axis.text.x = element_text(size = 8), 
    axis.text.y = element_text(size = 8), 
    plot.title = element_text(hjust = 0.5)  # Center the plot title
  )

It can be seen that most Android Version have ratings range between 4.0 and 5.0.

Distribution for Content.Rating

# Clean and prepare the Last Updated  and Content column
data_final <- data_final %>%
  mutate(
    Content.Rating = as.factor(Content.Rating)
  )

#Content Rating Distribution
content_rating_dist <- table(data_final$Content.Rating)
print("Content Rating Distribution:")
## [1] "Content Rating Distribution:"
print(content_rating_dist)
## 
## adults only 18+        everyone    everyone 10+      mature 17+            teen 
##               3            7903             322             393            1036 
##         unrated 
##               2

Visualization for Content Rating

# Bar plot for Content Rating
ggplot(data_final, aes(x = Content.Rating)) +
  geom_bar(fill = "skyblue") +
  geom_text(stat = "count", aes(label = ..count..), vjust = -0.5) +
  labs(title = "Distribution of App Content Ratings",
       x = "Content Rating",
       y = "Number of Apps") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Everyone is the most dominant Category with 81.82% of all apps and Adults 18+ being most least significant category with about 0.03% of overall app population

# Last Updated Analysis
# Create summary of updates by month and year
updates_by_month <- data_final %>%
  mutate(
    update_month = format(Last.Updated, "%Y-%m"),
    update_year = year(Last.Updated)
  ) %>%
  group_by(update_month) %>%
  summarize(count = n()) %>%
  arrange(update_month)
# Plot updates over time
#ggplot(updates_by_month, aes(x = as.Date(paste0(update_month, "-01")), y = count)) +
  #geom_line(color = "blue") +
  #geom_point(color = "red") +
  #labs(title = "Number of App Updates Over Time",
  #     x = "Date",
  #     y = "Number of Updates") +
  #theme_minimal() +
 # theme(axis.text.x = element_text(angle = 45, hjust = 1))

The number of updates have drastically increased from the end of 2017

# Content Rating and Update Frequency Relationship
update_frequency_by_rating <- data_final %>%
  group_by(Content.Rating) %>%
  summarize(
    avg_last_update = mean(Last.Updated),
    median_last_update = median(Last.Updated),
    n_apps = n()
  )
print("\nUpdate Frequency by Content Rating:")
## [1] "\nUpdate Frequency by Content Rating:"
print(update_frequency_by_rating)
## # A tibble: 6 × 4
##   Content.Rating  avg_last_update median_last_update n_apps
##   <fct>           <date>          <date>              <int>
## 1 adults only 18+ 2018-07-20      2018-07-24              3
## 2 everyone        2017-10-20      2018-04-20           7903
## 3 everyone 10+    2017-11-24      2018-06-06            322
## 4 mature 17+      2018-02-18      2018-07-09            393
## 5 teen            2017-12-03      2018-06-05           1036
## 6 unrated         2013-10-25      2013-10-25              2
# Content Rating Basic Analysis
#print("Basic Content Rating Analysis:")
#content_rating_counts <- table(data_final$Content.Rating)
#print(content_rating_counts)

# Basic bar plot for Content Rating
#ggplot(data_final, aes(x = Content.Rating)) +
#  geom_bar(fill = "skyblue") +
#  geom_text(stat = "count", aes(label = ..count..), vjust = -0.5) +
#  labs(title = "Distribution of App Content Ratings",
#       x = "Content Rating",
#       y = "Number of Apps") +
#   theme_minimal() +
#   theme(axis.text.x = element_text(angle = 45, hjust = 1))
# 
# # Calculate percentages
# content_rating_percentages <- prop.table(content_rating_counts) * 100
# print("\nContent Rating Percentages:")
# print(round(content_rating_percentages, 2))
# 
# # 1.2 Last Updated Basic Analysis
# data_final$Last.Updated <- as.Date(data_final$Last.Updated, format = "%B %d, %Y")
# 
# print("\nLast Updated Summary Statistics:")
# summary(data_final$Last.Updated)
# Create a duplicate dataframe for time-based analysis, use data_time_analysis
data_time_analysis <- data_final %>%
  mutate(Last.Updated = as.Date(Last.Updated, format = "%B %d, %Y"))

# Calculate max(Last.Updated)
max_last_updated <- max(data_time_analysis$Last.Updated, na.rm = TRUE)

#  Add columns for update year, month, quarter, and days since last update
data_time_analysis <- data_time_analysis %>%
  mutate(
    update_year = year(Last.Updated),
    update_month = month(Last.Updated),
    update_quarter = quarter(Last.Updated),
    days_since_update = as.numeric(difftime(max_last_updated, Last.Updated, units = "days"))
  )

# Monthly update pattern
monthly_updates <- data_time_analysis %>%
  group_by(update_year, update_month) %>%
  summarize(count = n()) %>%
  mutate(date = as.Date(paste(update_year, update_month, "01", sep = "-")))

# Plotting monthly update pattern
ggplot(monthly_updates, aes(x = date, y = count)) +
  geom_line(color = "blue") +
  geom_point() +
  labs(title = "App Updates Over Time",
       x = "Date",
       y = "Number of Updates") +
  theme_minimal()

# Content Rating Distribution by Update Quarter
ggplot(data_time_analysis, aes(x = factor(update_quarter), fill = Content.Rating)) +
  geom_bar(position = "dodge") +
  labs(title = "Content Rating Distribution by Quarter",
       x = "Quarter",
       y = "Count") +
  theme_minimal()

# Update Frequency Analysis by Content Rating
update_patterns <- data_time_analysis %>%
  group_by(Content.Rating) %>%
  summarize(
    avg_days_since_update = mean(days_since_update),
    median_days_since_update = median(days_since_update),
    sd_days_since_update = sd(days_since_update),
    n_apps = n(),
    cv = sd(days_since_update) / mean(days_since_update) * 100  # Coefficient of Variation
  ) %>%
  arrange(avg_days_since_update)

print("\nUpdate Patterns by Content Rating:")
## [1] "\nUpdate Patterns by Content Rating:"
print(update_patterns)
## # A tibble: 6 × 6
##   Content.Rating  avg_days_since_update median_days_since_update
##   <fct>                           <dbl>                    <dbl>
## 1 adults only 18+                  18.3                      15 
## 2 mature 17+                      171.                       30 
## 3 teen                            248.                       64 
## 4 everyone 10+                    257.                       63 
## 5 everyone                        292.                      110 
## 6 unrated                        1748.                     1748.
## # ℹ 3 more variables: sd_days_since_update <dbl>, n_apps <int>, cv <dbl>
# Advanced Visualization - Heatmap of Updates
update_heatmap_data <- data_time_analysis %>%
  group_by(update_month, Content.Rating) %>%
  summarize(count = n()) %>%
  spread(Content.Rating, count)

# Convert to matrix for heatmap
update_matrix <- as.matrix(update_heatmap_data[,-1])
rownames(update_matrix) <- month.abb[update_heatmap_data$update_month]

# Create heatmap
heatmap(update_matrix, 
        Colv = NA, 
        Rowv = NA,
        scale = "column",
        col = colorRampPalette(c("white", "steelblue"))(50),
        main = "Update Pattern Heatmap by Content Rating",
        xlab = "Content Rating",
        ylab = "Month")

# 3.4 Time Series Decomposition
# Focus on Everyone category as an example
#everyone_ts <- monthly_updates %>%
#  filter(count > 0) %>%
#  select(count) %>%
#  ts(frequency = 12)

#decomposed <- decompose(everyone_ts)
#plot(decomposed)

# Update Velocity Analysis
update_velocity <- data_time_analysis %>%
  group_by(Content.Rating) %>%
  summarize(
    update_velocity = n() / n_distinct(update_month),
    total_apps = n()
  ) %>%
  arrange(desc(update_velocity))

print("\nUpdate Velocity by Content Rating:")
## [1] "\nUpdate Velocity by Content Rating:"
print(update_velocity)
## # A tibble: 6 × 3
##   Content.Rating  update_velocity total_apps
##   <fct>                     <dbl>      <int>
## 1 everyone                  659.        7903
## 2 teen                       86.3       1036
## 3 mature 17+                 32.8        393
## 4 everyone 10+               26.8        322
## 5 adults only 18+             1.5          3
## 6 unrated                     1            2

Observation for Update Frequency Velocity Analysis:

This column represents the average number of updates per app for each content rating category. It reflects how frequently apps in each category receive updates.

# 1. Update Cycle Analysis - use data_update_time_analysis
data_update_time_analysis <- data_time_analysis %>%
  mutate(
    Last.Updated = as.Date(Last.Updated, format = "%B %d, %Y"),
    day_of_week = wday(Last.Updated, label = TRUE),
    week_of_year = week(Last.Updated),
    month_of_year = month(Last.Updated, label = TRUE),
    season = case_when(
      month_of_year %in% c("Dec", "Jan", "Feb") ~ "Winter",
      month_of_year %in% c("Mar", "Apr", "May") ~ "Spring",
      month_of_year %in% c("Jun", "Jul", "Aug") ~ "Summer",
      TRUE ~ "Fall"
    )
  )

# Day of Week Update Pattern by Content Rating
dow_pattern <- data_update_time_analysis %>%
  group_by(Content.Rating, day_of_week) %>%
  summarise(count = n()) %>%
  group_by(Content.Rating) %>%
  mutate(percentage = count/sum(count) * 100)

ggplot(dow_pattern, aes(x = day_of_week, y = percentage, fill = Content.Rating)) +
  geom_bar(stat = "identity", position = "dodge") +
  facet_wrap(~Content.Rating) +
  labs(title = "Update Day Preferences by Content Rating",
       x = "Day of Week",
       y = "Percentage of Updates") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# 2. Update Interval Analysis
update_intervals <- data_update_time_analysis %>%
  group_by(Content.Rating) %>%
  arrange(Last.Updated) %>%
  mutate(days_between_updates = as.numeric(Last.Updated - lag(Last.Updated))) %>%
  summarise(
    mean_interval = mean(days_between_updates, na.rm = TRUE),
    median_interval = median(days_between_updates, na.rm = TRUE),
    std_dev = sd(days_between_updates, na.rm = TRUE),
    cv = std_dev / mean_interval * 100  # Coefficient of Variation
  )

print("Update Interval Analysis:")
## [1] "Update Interval Analysis:"
print(update_intervals)
## # A tibble: 6 × 5
##   Content.Rating  mean_interval median_interval std_dev    cv
##   <fct>                   <dbl>           <dbl>   <dbl> <dbl>
## 1 adults only 18+        15                  15    7.07  47.1
## 2 everyone                0.380               0    3.53 929. 
## 3 everyone 10+            8.33                1   46.5  557. 
## 4 mature 17+              5.48                0   21.5  392. 
## 5 teen                    2.36                0   14.7  622. 
## 6 unrated              1213                1213   NA     NA
# 3. Seasonal Update Intensity
seasonal_intensity <- data_update_time_analysis %>%
  group_by(Content.Rating, season) %>%
  summarise(
    update_count = n(),
    update_intensity = n() / n_distinct(Last.Updated)
  ) %>%
  arrange(Content.Rating, desc(update_intensity))

# Visualization of seasonal patterns
ggplot(seasonal_intensity, aes(x = season, y = update_intensity, fill = Content.Rating)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Seasonal Update Intensity by Content Rating",
       x = "Season",
       y = "Update Intensity") +
  theme_minimal()

# 4. Update Clustering Analysis
#update_features <- data_final %>%
#  group_by(Content.Rating) %>%
#  summarise(
#    mean_week = mean(week_of_year),
#    std_week = sd(week_of_year),
#    update_frequency = n(),
#    weekend_ratio = sum(day_of_week %in% c("Sat", "Sun")) / n()
#  )

# Normalize the features
#update_features_norm <- scale(update_features[,-1])
#rownames(update_features_norm) <- update_features$Content.Rating

# Perform hierarchical clustering
#update_clusters <- hclust(dist(update_features_norm))
#plot(update_clusters, main = "Hierarchical Clustering of Content Ratings by Update Patterns")
# 6. Update Consistency Score
#onsistency_score <- data_final %>%
#  group_by(Content.Rating) %>%
#  summarise(
#    total_updates = n(),
#    unique_days = n_distinct(Last.Updated),
#   consistency_score = (total_updates / unique_days) * 
#      (1 - sd(as.numeric(day_of_week)) / 7)  # Normalized consistency metric
#  ) %>%
#  arrange(desc(consistency_score))

#print("\nUpdate Consistency Scores:")
#print(consistency_score)
# Convert Last.Updated to numeric (days since reference date) if not already done
# reference_date <- min(data_final$Last.Updated, na.rm = TRUE)  # Reference date
# data_final$Days.Since.Update <- as.numeric(data_final$Last.Updated - reference_date)
# 
# # Perform the Kolmogorov-Smirnov test on the numeric 'Days.Since.Update' values
# content_ratings <- unique(data_final$Content.Rating)
# ks_results <- data.frame(
#   rating1 = character(),
#   rating2 = character(),
#   p_value = numeric()
# )
# 
# for (i in 1:(length(content_ratings)-1)) {
#   for (j in (i+1):length(content_ratings)) {
#     # Extract groups, removing NA values
#     group1 <- na.omit(data_final$Days.Since.Update[data_final$Content.Rating == content_ratings[i]])
#     group2 <- na.omit(data_final$Days.Since.Update[data_final$Content.Rating == content_ratings[j]])
#     
#     # Check if both groups have enough data for comparison
#     if(length(group1) > 1 && length(group2) > 1) {
#       ks_test <- ks.test(group1, group2)
#       ks_results <- rbind(ks_results, 
#                           data.frame(rating1 = content_ratings[i],
#                                      rating2 = content_ratings[j],
#                                      p_value = ks_test$p.value))
#     }
#   }
# }
# 
# print("\nKolmogorov-Smirnov Test Results:")
# print(ks_results[ks_results$p_value < 0.05,])

Visualization for Content Rating vs Installs

# 1. Basic statistics for Installs by Content Rating
installs_by_rating <- data_final %>%
  group_by(Content.Rating) %>%
  summarise(
    mean_installs = mean(Installs, na.rm = TRUE),
    median_installs = median(Installs, na.rm = TRUE),
    total_installs = sum(Installs, na.rm = TRUE),
    n_apps = n()
  ) %>%
  arrange(desc(mean_installs))

print("Summary of Installs by Content Rating:")
## [1] "Summary of Installs by Content Rating:"
print(installs_by_rating)
## # A tibble: 6 × 5
##   Content.Rating  mean_installs median_installs total_installs n_apps
##   <fct>                   <dbl>           <dbl>          <dbl>  <int>
## 1 teen                15914358.          500000    16487275393   1036
## 2 everyone 10+        12472894.         1000000     4016271795    322
## 3 everyone             6602474.           50000    52179352961   7903
## 4 mature 17+           6203529.          500000     2437986878    393
## 5 adults only 18+       666667.          500000        2000000      3
## 6 unrated                25250            25250          50500      2
# 2. Visualize distribution of installs by content rating
ggplot(data_final, aes(x = Content.Rating, y = log10(Installs))) +
  geom_boxplot(fill = "lightblue") +
  labs(title = "Distribution of App Installs by Content Rating",
       x = "Content Rating",
       y = "Log10(Number of Installs)") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Visualization for Last Updated vs Installs

data_analysis <- data_final %>%
  mutate(
    days_since_update = as.numeric(difftime(max(Last.Updated), Last.Updated, units = "days")),
    update_year = year(Last.Updated),
    update_month = month(Last.Updated)
  )


data_analysis <- data_analysis %>%
  mutate(update_recency = ifelse(days_since_update <= median(days_since_update),
                                "Recent Update", "Old Update"))

recent_vs_old <- data_analysis %>%
  group_by(Content.Rating, update_recency) %>%
  summarise(
    mean_installs = mean(Installs, na.rm = TRUE),
    median_installs = median(Installs, na.rm = TRUE),
    n_apps = n()
  )

print("\nComparison of Installs by Update Recency and Content Rating:")
## [1] "\nComparison of Installs by Update Recency and Content Rating:"
print(recent_vs_old)
## # A tibble: 10 × 5
## # Groups:   Content.Rating [6]
##    Content.Rating  update_recency mean_installs median_installs n_apps
##    <fct>           <chr>                  <dbl>           <dbl>  <int>
##  1 adults only 18+ Recent Update        666667.          500000      3
##  2 everyone        Old Update          1787608.           10000   4110
##  3 everyone        Recent Update      11819742.          500000   3793
##  4 everyone 10+    Old Update          2711120.          100000    135
##  5 everyone 10+    Recent Update      19520163.         1000000    187
##  6 mature 17+      Old Update           875646.          100000    118
##  7 mature 17+      Recent Update       8489675.          500000    275
##  8 teen            Old Update          1625562.           50000    441
##  9 teen            Recent Update      26504878.         1000000    595
## 10 unrated         Old Update            25250            25250      2
# 7. Visualization of update recency effect
ggplot(data_analysis, aes(x = Content.Rating, y = log10(Installs), fill = update_recency)) +
  geom_boxplot() +
  labs(title = "Install Distribution by Content Rating and Update Recency",
       x = "Content Rating",
       y = "Log10(Number of Installs)",
       fill = "Update Recency") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Visualization for Last Updated vs Content Rating vs Installs

# 3. Timeline analysis: Average installs over time by content rating
installs_timeline <- data_final %>%
  group_by(Content.Rating, Last.Updated) %>%
  summarise(avg_installs = mean(Installs, na.rm = TRUE)) %>%
  ungroup()

ggplot(installs_timeline, aes(x = Last.Updated, y = log10(avg_installs), color = Content.Rating)) +
  geom_smooth(method = "loess", se = FALSE) +
  labs(title = "Average App Installs Over Time by Content Rating",
       x = "Last Updated Date",
       y = "Log10(Average Installs)") +
  theme_minimal() +
  theme(legend.position = "bottom")

Statistical Tests

Statistical test for Installs and Price

# Check for missing values and ensure no negative/zero values in log_Installs
#data_final <- data_final %>%
  #filter(!is.na(Installs), Installs > 0)  # Remove missing values and zeros in Installs

# Apply log transformation, adding 1 to avoid log(0)
data_final$log_Installs <- log(data_final$Installs + 1)

# Ensure Price_Category has no missing values
data_final <- data_final %>%
 filter(!is.na(Price_Category))

#Perform t-test on log-transformed Installs by Price Category
t_test_result <- t.test(log_Installs ~ Price_Category, data = data_final, var.equal = FALSE)

#Print t-test results
print(t_test_result)
## 
##  Welch Two Sample t-test
## 
## data:  log_Installs by Price_Category
## t = 29.262, df = 981.81, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group Free and group Paid is not equal to 0
## 95 percent confidence interval:
##  3.552682 4.063439
## sample estimates:
## mean in group Free mean in group Paid 
##          10.996530           7.188469

There is a statistically significant difference between the number of installs for “Free” and “Paid” apps, with the p-value being extremely small.

From the above analysis, we can practically state that free apps are more popular than paid apps, which can be considered true in the app market.

T-Test for Reviews and Price

#Confirming with a t-test
# Perform t-test for Reviews between Free and Paid
t_test_reviews <- t.test(Reviews ~ Price_Category, data = data_final)

# Perform t-test for Rating between Free and Paid
t_test_rating <- t.test(Rating ~ Price_Category, data = data_final)

# Print the results
print(t_test_reviews)
## 
##  Welch Two Sample t-test
## 
## data:  Reviews by Price_Category
## t = 11.019, df = 9299.1, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group Free and group Paid is not equal to 0
## 95 percent confidence interval:
##  185401.3 265636.3
## sample estimates:
## mean in group Free mean in group Paid 
##         234243.689           8724.888
print(t_test_rating)
## 
##  Welch Two Sample t-test
## 
## data:  Rating by Price_Category
## t = -3.9443, df = 883.57, p-value = 8.638e-05
## alternative hypothesis: true difference in means between group Free and group Paid is not equal to 0
## 95 percent confidence interval:
##  -0.1121028 -0.0376075
## sample estimates:
## mean in group Free mean in group Paid 
##           4.167384           4.242239
  • There is a statistically significant difference between the mean number of reviews for Free and Paid apps. Free apps have significantly more reviews on average.

  • There is a statistically significant difference between the mean ratings for Free and Paid apps. Paid apps have slightly higher ratings on average, though the difference is small.

ANOVA Test for Reviews vs Ratings

The tests below are to test whether or not different review categories have different average ratings.

anova_result <- aov(Rating ~ as.factor(Review_Category), data = data_clean)
summary(anova_result)
##                              Df Sum Sq Mean Sq F value Pr(>F)    
## as.factor(Review_Category)   11  106.3   9.662   41.36 <2e-16 ***
## Residuals                  9647 2253.6   0.234                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

According to p-value, it is significant hence we can say that the average rating for all review categories is not same.

Post Hoc Test - Review Category

# Perform Tukey's HSD
tukey_result <- TukeyHSD(anova_result)
tukey_result
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Rating ~ as.factor(Review_Category), data = data_clean)
## 
## $`as.factor(Review_Category)`
##                     diff          lwr         upr     p adj
## 100+-0+     -0.096683215 -0.152307271 -0.04105916 0.0000009
## 500+-0+     -0.063032835 -0.141474646  0.01540898 0.2646281
## 1K+-0+      -0.019190832 -0.089971134  0.05158947 0.9992526
## 2.5K+-0+     0.003350463 -0.074143085  0.08084401 1.0000000
## 5K+-0+       0.064918154 -0.012646893  0.14248320 0.2087515
## 10K+-0+      0.095614797  0.030638525  0.16059107 0.0000973
## 25K+-0+      0.105627098  0.035846939  0.17540726 0.0000488
## 50K+-0+      0.167554014  0.091642554  0.24346547 0.0000000
## 100K+-0+     0.203608898  0.135724795  0.27149300 0.0000000
## 300K+-0+     0.249388670  0.170111342  0.32866600 0.0000000
## 1M+-0+       0.300139945  0.211244127  0.38903576 0.0000000
## 500+-100+    0.033650380 -0.054364565  0.12166533 0.9848292
## 1K+-100+     0.077492383 -0.003768703  0.15875347 0.0784345
## 2.5K+-100+   0.100033678  0.012862795  0.18720456 0.0096675
## 5K+-100+     0.161601369  0.074366918  0.24883582 0.0000001
## 10K+-100+    0.192298012  0.116039053  0.26855697 0.0000000
## 25K+-100+    0.202310313  0.121918874  0.28270175 0.0000000
## 50K+-100+    0.264237229  0.178469737  0.35000472 0.0000000
## 100K+-100+   0.300292113  0.221540831  0.37904339 0.0000000
## 300K+-100+   0.346071885  0.257311491  0.43483228 0.0000000
## 1M+-100+     0.396823160  0.299375844  0.49427048 0.0000000
## 1K+-500+     0.043842003 -0.054455739  0.14213974 0.9515761
## 2.5K+-500+   0.066383298 -0.036853541  0.16962014 0.6214468
## 5K+-500+     0.127950989  0.024660470  0.23124151 0.0030189
## 10K+-500+    0.158647632  0.064443010  0.25285225 0.0000025
## 25K+-500+    0.168659933  0.071079887  0.26623998 0.0000011
## 50K+-500+    0.230586849  0.128532233  0.33264146 0.0000000
## 100K+-500+   0.266641733  0.170408442  0.36287502 0.0000000
## 300K+-500+   0.312421505  0.207839051  0.41700396 0.0000000
## 1M+-500+     0.363172780  0.251123410  0.47522215 0.0000000
## 2.5K+-1K+    0.022541295 -0.075001405  0.12008400 0.9998394
## 5K+-1K+      0.084108986 -0.013490527  0.18170850 0.1727899
## 10K+-1K+     0.114805629  0.026878134  0.20273312 0.0012014
## 25K+-1K+     0.124817930  0.033283243  0.21635262 0.0005180
## 50K+-1K+     0.186744846  0.090454254  0.28303544 0.0000000
## 100K+-1K+    0.222799730  0.132702117  0.31289734 0.0000000
## 300K+-1K+    0.268579502  0.169613735  0.36754527 0.0000000
## 1M+-1K+      0.319330777  0.212504774  0.42615678 0.0000000
## 5K+-2.5K+    0.061567691 -0.041004546  0.16413993 0.7193424
## 10K+-2.5K+   0.092264334 -0.001152170  0.18568084 0.0565429
## 25K+-2.5K+   0.102276635  0.005457227  0.19909604 0.0276896
## 50K+-2.5K+   0.164203551  0.062875978  0.26553112 0.0000078
## 100K+-2.5K+  0.200258435  0.104796512  0.29572036 0.0000000
## 300K+-2.5K+  0.246038206  0.142165102  0.34991131 0.0000000
## 1M+-2.5K+    0.296789482  0.185401898  0.40817707 0.0000000
## 10K+-5K+     0.030696643 -0.062779181  0.12417247 0.9957463
## 25K+-5K+     0.040708944 -0.056167701  0.13758559 0.9685508
## 50K+-5K+     0.102635860  0.001253596  0.20401812 0.0440982
## 100K+-5K+    0.138690744  0.043170771  0.23421072 0.0001331
## 300K+-5K+    0.184470516  0.080544059  0.28839697 0.0000004
## 1M+-5K+      0.235221791  0.123784453  0.34665913 0.0000000
## 25K+-10K+    0.010012302 -0.077112114  0.09713672 0.9999999
## 50K+-10K+    0.071939217 -0.020169104  0.16404754 0.3070668
## 100K+-10K+   0.107994101  0.022380758  0.19360745 0.0022235
## 300K+-10K+   0.153773873  0.058872409  0.24867534 0.0000078
## 1M+-10K+     0.204525148  0.101453039  0.30759726 0.0000000
## 50K+-25K+    0.061926916 -0.033630908  0.15748474 0.6094814
## 100K+-25K+   0.097981800  0.008667751  0.18729585 0.0175649
## 300K+-25K+   0.143761571  0.045508620  0.24201452 0.0001113
## 1M+-25K+     0.194512847  0.088346871  0.30067882 0.0000001
## 100K+-50K+   0.036054884 -0.058127272  0.13023704 0.9846717
## 300K+-50K+   0.081834656 -0.020863551  0.18453286 0.2768896
## 1M+-50K+     0.132585931  0.022293168  0.24287869 0.0048805
## 300K+-100K+  0.045779772 -0.051135776  0.14269532 0.9282456
## 1M+-100K+    0.096531047 -0.008398431  0.20146052 0.1064662
## 1M+-300K+    0.050751275 -0.061884591  0.16338714 0.9479902
# Convert the result to a data frame
tukey_df <- as.data.frame(tukey_result$`as.factor(Review_Category)`)

# Filter for significant p-values
significant_tukey <- tukey_df[tukey_df[4] < 0.05, ]

# Display the significant results
print(significant_tukey)
##                    diff          lwr         upr        p adj
## 100+-0+     -0.09668322 -0.152307271 -0.04105916 8.987756e-07
## 10K+-0+      0.09561480  0.030638525  0.16059107 9.732720e-05
## 25K+-0+      0.10562710  0.035846939  0.17540726 4.884843e-05
## 50K+-0+      0.16755401  0.091642554  0.24346547 0.000000e+00
## 100K+-0+     0.20360890  0.135724795  0.27149300 0.000000e+00
## 300K+-0+     0.24938867  0.170111342  0.32866600 0.000000e+00
## 1M+-0+       0.30013994  0.211244127  0.38903576 0.000000e+00
## 2.5K+-100+   0.10003368  0.012862795  0.18720456 9.667490e-03
## 5K+-100+     0.16160137  0.074366918  0.24883582 9.538328e-08
## 10K+-100+    0.19229801  0.116039053  0.26855697 0.000000e+00
## 25K+-100+    0.20231031  0.121918874  0.28270175 0.000000e+00
## 50K+-100+    0.26423723  0.178469737  0.35000472 0.000000e+00
## 100K+-100+   0.30029211  0.221540831  0.37904339 0.000000e+00
## 300K+-100+   0.34607188  0.257311491  0.43483228 0.000000e+00
## 1M+-100+     0.39682316  0.299375844  0.49427048 0.000000e+00
## 5K+-500+     0.12795099  0.024660470  0.23124151 3.018884e-03
## 10K+-500+    0.15864763  0.064443010  0.25285225 2.473396e-06
## 25K+-500+    0.16865993  0.071079887  0.26623998 1.080775e-06
## 50K+-500+    0.23058685  0.128532233  0.33264146 0.000000e+00
## 100K+-500+   0.26664173  0.170408442  0.36287502 0.000000e+00
## 300K+-500+   0.31242150  0.207839051  0.41700396 0.000000e+00
## 1M+-500+     0.36317278  0.251123410  0.47522215 0.000000e+00
## 10K+-1K+     0.11480563  0.026878134  0.20273312 1.201416e-03
## 25K+-1K+     0.12481793  0.033283243  0.21635262 5.179950e-04
## 50K+-1K+     0.18674485  0.090454254  0.28303544 1.572425e-08
## 100K+-1K+    0.22279973  0.132702117  0.31289734 0.000000e+00
## 300K+-1K+    0.26857950  0.169613735  0.36754527 0.000000e+00
## 1M+-1K+      0.31933078  0.212504774  0.42615678 0.000000e+00
## 25K+-2.5K+   0.10227664  0.005457227  0.19909604 2.768961e-02
## 50K+-2.5K+   0.16420355  0.062875978  0.26553112 7.808701e-06
## 100K+-2.5K+  0.20025843  0.104796512  0.29572036 3.507883e-10
## 300K+-2.5K+  0.24603821  0.142165102  0.34991131 0.000000e+00
## 1M+-2.5K+    0.29678948  0.185401898  0.40817707 0.000000e+00
## 50K+-5K+     0.10263586  0.001253596  0.20401812 4.409823e-02
## 100K+-5K+    0.13869074  0.043170771  0.23421072 1.331239e-04
## 300K+-5K+    0.18447052  0.080544059  0.28839697 4.428778e-07
## 1M+-5K+      0.23522179  0.123784453  0.34665913 2.244942e-10
## 100K+-10K+   0.10799410  0.022380758  0.19360745 2.223466e-03
## 300K+-10K+   0.15377387  0.058872409  0.24867534 7.832139e-06
## 1M+-10K+     0.20452515  0.101453039  0.30759726 5.942656e-09
## 100K+-25K+   0.09798180  0.008667751  0.18729585 1.756493e-02
## 300K+-25K+   0.14376157  0.045508620  0.24201452 1.113055e-04
## 1M+-25K+     0.19451285  0.088346871  0.30067882 1.436204e-07
## 1M+-50K+     0.13258593  0.022293168  0.24287869 4.880458e-03

As we can see, the significant difference for average rating for different review categories is between 0+ and 1M+ as expected.

For easier Ratings and Reviews vs Installs we can group Installs into categories given

Test for Installs by Category

# Ensure that Category is treated as a factor
data_final$Category <- as.factor(data_final$Category)

# Perform one-way ANOVA
anova_result <- aov(log_Installs ~ Category, data = data_final)

# Summary of the ANOVA result
summary(anova_result)
##               Df Sum Sq Mean Sq F value Pr(>F)    
## Category      32  21252   664.1   39.09 <2e-16 ***
## Residuals   9626 163522    17.0                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since the ANOVA indicates significant differences among the groups, conducting a post-hoc analysis (Tukey’s HSD) to determine which specific categories differ from each other.

# Post-hoc test (as ANOVA is significant)
TukeyHSD(anova_result)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = log_Installs ~ Category, data = data_final)
## 
## $Category
##                                                 diff          lwr         upr
## AUTO_AND_VEHICLES-ART_AND_DESIGN        -1.333946079 -3.923524642  1.25563248
## BEAUTY-ART_AND_DESIGN                   -0.754689386 -3.660721545  2.15134277
## BOOKS_AND_REFERENCE-ART_AND_DESIGN      -1.510577243 -3.730572003  0.70941752
## BUSINESS-ART_AND_DESIGN                 -3.461218500 -5.560851817 -1.36158518
## COMICS-ART_AND_DESIGN                    0.199142684 -2.663993658  3.06227903
## COMMUNICATION-ART_AND_DESIGN             0.001519125 -2.143888145  2.14692640
## DATING-ART_AND_DESIGN                   -1.737471023 -4.030350864  0.55540882
## EDUCATION-ART_AND_DESIGN                 2.174632189 -0.250846617  4.60011100
## ENTERTAINMENT-ART_AND_DESIGN             3.571787284  1.076621663  6.06695291
## EVENTS-ART_AND_DESIGN                   -3.101147241 -5.867199735 -0.33509475
## FAMILY-ART_AND_DESIGN                   -0.808748355 -2.798513651  1.18101694
## FINANCE-ART_AND_DESIGN                  -1.048969776 -3.178567267  1.08062771
## FOOD_AND_DRINK-ART_AND_DESIGN            0.241573356 -2.210270124  2.69341684
## GAME-ART_AND_DESIGN                      1.898985933 -0.121119019  3.91909088
## HEALTH_AND_FITNESS-ART_AND_DESIGN       -0.334869967 -2.497192668  1.82745273
## HOUSE_AND_HOME-ART_AND_DESIGN            0.554439863 -2.116530710  3.22541044
## LIBRARIES_AND_DEMO-ART_AND_DESIGN       -1.020655132 -3.616846100  1.57553583
## LIFESTYLE-ART_AND_DESIGN                -1.722814971 -3.841547747  0.39591781
## MAPS_AND_NAVIGATION-ART_AND_DESIGN      -0.085300113 -2.471611303  2.30101108
## MEDICAL-ART_AND_DESIGN                  -3.682553432 -5.790954438 -1.57415242
## NEWS_AND_MAGAZINES-ART_AND_DESIGN       -0.943248560 -3.131726240  1.24522912
## PARENTING-ART_AND_DESIGN                -0.058770575 -2.870546041  2.75300489
## PERSONALIZATION-ART_AND_DESIGN          -1.478643094 -3.594458795  0.63717261
## PHOTOGRAPHY-ART_AND_DESIGN               1.840400248 -0.326813802  4.00761430
## PRODUCTIVITY-ART_AND_DESIGN             -0.567762994 -2.684401409  1.54887542
## SHOPPING-ART_AND_DESIGN                  1.104670412 -1.139782785  3.34912361
## SOCIAL-ART_AND_DESIGN                    0.055522421 -2.146733487  2.25777833
## SPORTS-ART_AND_DESIGN                   -0.684952080 -2.824778474  1.45487431
## TOOLS-ART_AND_DESIGN                    -0.499004900 -2.529170746  1.53116095
## TRAVEL_AND_LOCAL-ART_AND_DESIGN          0.002872826 -2.220521946  2.22626760
## VIDEO_PLAYERS-ART_AND_DESIGN             1.135736938 -1.172415812  3.44388969
## WEATHER-ART_AND_DESIGN                   1.505772021 -1.125706885  4.13725093
## BEAUTY-AUTO_AND_VEHICLES                 0.579256694 -2.159336677  3.31785006
## BOOKS_AND_REFERENCE-AUTO_AND_VEHICLES   -0.176631164 -2.172438638  1.81917631
## BUSINESS-AUTO_AND_VEHICLES              -2.127272421 -3.988275055 -0.26626979
## COMICS-AUTO_AND_VEHICLES                 1.533088763 -1.159943069  4.22612060
## COMMUNICATION-AUTO_AND_VEHICLES          1.335465204 -0.577031357  3.24796176
## DATING-AUTO_AND_VEHICLES                -0.403524944 -2.480101141  1.67305125
## EDUCATION-AUTO_AND_VEHICLES              3.508578268  1.286458069  5.73069847
## ENTERTAINMENT-AUTO_AND_VEHICLES          4.905733363  2.607751158  7.20371557
## EVENTS-AUTO_AND_VEHICLES                -1.767201162 -4.356779725  0.82237740
## FAMILY-AUTO_AND_VEHICLES                 0.525197724 -1.210900392  2.26129584
## FINANCE-AUTO_AND_VEHICLES                0.284976303 -1.609768062  2.17972067
## FOOD_AND_DRINK-AUTO_AND_VEHICLES         1.575519436 -0.675348662  3.82638753
## GAME-AUTO_AND_VEHICLES                   3.232932012  1.462142709  5.00372132
## HEALTH_AND_FITNESS-AUTO_AND_VEHICLES     0.999076112 -0.932376762  2.93052899
## HOUSE_AND_HOME-AUTO_AND_VEHICLES         1.888385942 -0.599375007  4.37614689
## LIBRARIES_AND_DEMO-AUTO_AND_VEHICLES     0.313290947 -2.094005935  2.72058783
## LIFESTYLE-AUTO_AND_VEHICLES             -0.388868892 -2.271393610  1.49365583
## MAPS_AND_NAVIGATION-AUTO_AND_VEHICLES    1.248645966 -0.930654801  3.42794673
## MEDICAL-AUTO_AND_VEHICLES               -2.348607352 -4.219496324 -0.47771838
## NEWS_AND_MAGAZINES-AUTO_AND_VEHICLES     0.390697519 -1.569992486  2.35138752
## PARENTING-AUTO_AND_VEHICLES              1.275175505 -1.363186111  3.91353712
## PERSONALIZATION-AUTO_AND_VEHICLES       -0.144697015 -2.023938037  1.73454401
## PHOTOGRAPHY-AUTO_AND_VEHICLES            3.174346327  1.237418999  5.11127366
## PRODUCTIVITY-AUTO_AND_VEHICLES           0.766183086 -1.113984174  2.64635034
## SHOPPING-AUTO_AND_VEHICLES               2.438616491  0.415638268  4.46159471
## SOCIAL-AUTO_AND_VEHICLES                 1.389468500 -0.586588641  3.36552564
## SPORTS-AUTO_AND_VEHICLES                 0.648993999 -1.257239915  2.55522791
## TOOLS-AUTO_AND_VEHICLES                  0.834941179 -0.947316967  2.61719933
## TRAVEL_AND_LOCAL-AUTO_AND_VEHICLES       1.336818905 -0.662769816  3.33640763
## VIDEO_PLAYERS-AUTO_AND_VEHICLES          2.469683017  0.376255240  4.56311079
## WEATHER-AUTO_AND_VEHICLES                2.839718100  0.394405856  5.28503034
## BOOKS_AND_REFERENCE-BEAUTY              -0.755887858 -3.148031926  1.63625621
## BUSINESS-BEAUTY                         -2.706529114 -4.987414157 -0.42564407
## COMICS-BEAUTY                            0.953832069 -2.044755302  3.95241944
## COMMUNICATION-BEAUTY                     0.756208511 -1.566881860  3.07929888
## DATING-BEAUTY                           -0.982781638 -3.442715473  1.47715220
## EDUCATION-BEAUTY                         2.929321575  0.345347126  5.51329602
## ENTERTAINMENT-BEAUTY                     4.326476670  1.676980870  6.97597247
## EVENTS-BEAUTY                           -2.346457856 -5.252490014  0.55957430
## FAMILY-BEAUTY                           -0.054058970 -2.234229248  2.12611131
## FINANCE-BEAUTY                          -0.294280390 -2.602778167  2.01421739
## FOOD_AND_DRINK-BEAUTY                    0.996262742 -1.612475069  3.60500055
## GAME-BEAUTY                              2.653675319  0.445780274  4.86157036
## HEALTH_AND_FITNESS-BEAUTY                0.419819418 -1.918901595  2.75854043
## HOUSE_AND_HOME-BEAUTY                    1.309129248 -1.506551897  4.12481039
## LIBRARIES_AND_DEMO-BEAUTY               -0.265965747 -3.010812564  2.47888107
## LIFESTYLE-BEAUTY                        -0.968125585 -3.266604449  1.33035328
## MAPS_AND_NAVIGATION-BEAUTY               0.669389273 -1.877855829  3.21663437
## MEDICAL-BEAUTY                          -2.927864046 -5.216822612 -0.63890548
## NEWS_AND_MAGAZINES-BEAUTY               -0.188559175 -2.551483437  2.17436509
## PARENTING-BEAUTY                         0.695918811 -2.253667226  3.64550485
## PERSONALIZATION-BEAUTY                  -0.723953708 -3.019743898  1.57183648
## PHOTOGRAPHY-BEAUTY                       2.595089634  0.251845462  4.93833381
## PRODUCTIVITY-BEAUTY                      0.186926392 -2.109622039  2.48347482
## SHOPPING-BEAUTY                          1.859359797 -0.555499756  4.27421935
## SOCIAL-BEAUTY                            0.810211806 -1.565479166  3.18590278
## SPORTS-BEAUTY                            0.069737306 -2.248200034  2.38767464
## TOOLS-BEAUTY                             0.255684486 -1.961419453  2.47278842
## TRAVEL_AND_LOCAL-BEAUTY                  0.757562212 -1.637737525  3.15286195
## VIDEO_PLAYERS-BEAUTY                     1.890426323 -0.583749424  4.36460207
## WEATHER-BEAUTY                           2.260461407 -0.517785837  5.03870865
## BUSINESS-BOOKS_AND_REFERENCE            -1.950641256 -3.249021171 -0.65226134
## COMICS-BOOKS_AND_REFERENCE               1.709719927 -0.630126231  4.04956609
## COMMUNICATION-BOOKS_AND_REFERENCE        1.512096368  0.140928001  2.88326474
## DATING-BOOKS_AND_REFERENCE              -0.226893780 -1.818945380  1.36515782
## EDUCATION-BOOKS_AND_REFERENCE            3.685209433  1.907492420  5.46292644
## ENTERTAINMENT-BOOKS_AND_REFERENCE        5.082364528  3.210685652  6.95404340
## EVENTS-BOOKS_AND_REFERENCE              -1.590569998 -3.810564757  0.62942476
## FAMILY-BOOKS_AND_REFERENCE               0.701828888 -0.410149649  1.81380742
## FINANCE-BOOKS_AND_REFERENCE              0.461607468 -0.884689605  1.80790454
## FOOD_AND_DRINK-BOOKS_AND_REFERENCE       1.752150600 -0.061372709  3.56567391
## GAME-BOOKS_AND_REFERENCE                 3.409563176  2.244164381  4.57496197
## HEALTH_AND_FITNESS-BOOKS_AND_REFERENCE   1.175707276 -0.221779683  2.57319424
## HOUSE_AND_HOME-BOOKS_AND_REFERENCE       2.065017106 -0.035319566  4.16535378
## LIBRARIES_AND_DEMO-BOOKS_AND_REFERENCE   0.489922111 -1.514457562  2.49430178
## LIFESTYLE-BOOKS_AND_REFERENCE           -0.212237727 -1.541282088  1.11680663
## MAPS_AND_NAVIGATION-BOOKS_AND_REFERENCE  1.425277131 -0.298617133  3.14917139
## MEDICAL-BOOKS_AND_REFERENCE             -2.171976188 -3.484487194 -0.85946518
## NEWS_AND_MAGAZINES-BOOKS_AND_REFERENCE   0.567328683 -0.870296056  2.00495342
## PARENTING-BOOKS_AND_REFERENCE            1.451806669 -0.824904031  3.72851737
## PERSONALIZATION-BOOKS_AND_REFERENCE      0.031934150 -1.292454924  1.35632322
## PHOTOGRAPHY-BOOKS_AND_REFERENCE          3.350977492  1.945934050  4.75602093
## PRODUCTIVITY-BOOKS_AND_REFERENCE         0.942814250 -0.382888779  2.26851728
## SHOPPING-BOOKS_AND_REFERENCE             2.615247655  1.093767952  4.13672736
## SOCIAL-BOOKS_AND_REFERENCE               1.566099664  0.107586236  3.02461309
## SPORTS-BOOKS_AND_REFERENCE               0.825625163 -0.536794498  2.18804482
## TOOLS-BOOKS_AND_REFERENCE                1.011572343 -0.171180250  2.19432494
## TRAVEL_AND_LOCAL-BOOKS_AND_REFERENCE     1.513450070  0.023210273  3.00368987
## VIDEO_PLAYERS-BOOKS_AND_REFERENCE        2.646314181  1.032344091  4.26028427
## WEATHER-BOOKS_AND_REFERENCE              3.016349264  0.966468398  5.06623013
## COMICS-BUSINESS                          3.660361184  1.434386480  5.88633589
## COMMUNICATION-BUSINESS                   3.462737625  2.296467491  4.62900776
## DATING-BUSINESS                          1.723747477  0.304342097  3.14315286
## EDUCATION-BUSINESS                       5.635850689  4.010931888  7.26076949
## ENTERTAINMENT-BUSINESS                   7.033005784  5.305792668  8.76021890
## EVENTS-BUSINESS                          0.360071259 -1.739562058  2.45970458
## FAMILY-BUSINESS                          2.652470144  1.805959896  3.49898039
## FINANCE-BUSINESS                         2.412248724  1.275323422  3.54917403
## FOOD_AND_DRINK-BUSINESS                  3.702791856  2.038775593  5.36680812
## GAME-BUSINESS                            5.360204433  4.444651219  6.27575765
## HEALTH_AND_FITNESS-BUSINESS              3.126348533  1.929246559  4.32345051
## HOUSE_AND_HOME-BUSINESS                  4.015658362  2.042970483  5.98834624
## LIBRARIES_AND_DEMO-BUSINESS              2.440563367  0.570370541  4.31075619
## LIFESTYLE-BUSINESS                       1.738403529  0.621961749  2.85484531
## MAPS_AND_NAVIGATION-BUSINESS             3.375918387  1.810065659  4.94177112
## MEDICAL-BUSINESS                        -0.221334932 -1.318042949  0.87537309
## NEWS_AND_MAGAZINES-BUSINESS              2.517969939  1.274246407  3.76169347
## PARENTING-BUSINESS                       3.402447925  1.242935261  5.56196059
## PERSONALIZATION-BUSINESS                 1.982575406  0.871679482  3.09347133
## PHOTOGRAPHY-BUSINESS                     5.301618748  4.095703988  6.50753351
## PRODUCTIVITY-BUSINESS                    2.893455506  1.780993438  4.00591757
## SHOPPING-BUSINESS                        4.565888912  3.226118848  5.90565897
## SOCIAL-BUSINESS                          3.516740920  2.248929912  4.78455193
## SPORTS-BUSINESS                          2.776266420  1.620294678  3.93223816
## TOOLS-BUSINESS                           2.962213600  2.024670517  3.89975668
## TRAVEL_AND_LOCAL-BUSINESS                3.464091326  2.159906529  4.76827612
## VIDEO_PLAYERS-BUSINESS                   4.596955437  3.153008490  6.04090238
## WEATHER-BUSINESS                         4.966990521  3.048111967  6.88586907
## COMMUNICATION-COMICS                    -0.197623559 -2.466825108  2.07157799
## DATING-COMICS                           -1.936613707 -4.345721690  0.47249428
## EDUCATION-COMICS                         1.975489505 -0.560146658  4.51112567
## ENTERTAINMENT-COMICS                     3.372644600  0.770269763  5.97501944
## EVENTS-COMICS                           -3.300289925 -6.163426266 -0.43715358
## FAMILY-COMICS                           -1.007891039 -3.130547216  1.11476514
## FINANCE-COMICS                          -1.248112460 -3.502372601  1.00614768
## FOOD_AND_DRINK-COMICS                    0.042430673 -2.518436323  2.60329767
## GAME-COMICS                              1.699843249 -0.451279093  3.85096559
## HEALTH_AND_FITNESS-COMICS               -0.534012651 -2.819213469  1.75118817
## HOUSE_AND_HOME-COMICS                    0.355297179 -2.416090040  3.12668440
## LIBRARIES_AND_DEMO-COMICS               -1.219797816 -3.919188646  1.47959301
## LIFESTYLE-COMICS                        -1.921957655 -4.165956739  0.32204143
## MAPS_AND_NAVIGATION-COMICS              -0.284442797 -2.782639028  2.21375343
## MEDICAL-COMICS                          -3.881696115 -6.115942771 -1.64744946
## NEWS_AND_MAGAZINES-COMICS               -1.142391244 -3.452356150  1.16757366
## PARENTING-COMICS                        -0.257913258 -3.165246158  2.64941964
## PERSONALIZATION-COMICS                  -1.677785778 -3.919030833  0.56345928
## PHOTOGRAPHY-COMICS                       1.641257564 -0.648572135  3.93108726
## PRODUCTIVITY-COMICS                     -0.766905677 -3.008927421  1.47511607
## SHOPPING-COMICS                          0.905527728 -1.457536695  3.26859215
## SOCIAL-COMICS                           -0.143620263 -2.466642949  2.17940242
## SPORTS-COMICS                           -0.884094764 -3.148020626  1.37983110
## TOOLS-COMICS                            -0.698147584 -2.858720812  1.46242564
## TRAVEL_AND_LOCAL-COMICS                 -0.196269858 -2.539342120  2.14680241
## VIDEO_PLAYERS-COMICS                     0.936594254 -1.487054324  3.36024283
## WEATHER-COMICS                           1.306629337 -1.426717429  4.03997610
## DATING-COMMUNICATION                    -1.738990148 -3.225268696 -0.25271160
## EDUCATION-COMMUNICATION                  2.173113064  0.489464230  3.85676190
## ENTERTAINMENT-COMMUNICATION              3.570268159  1.787692091  5.35284423
## EVENTS-COMMUNICATION                    -3.102666366 -5.248073636 -0.95725910
## FAMILY-COMMUNICATION                    -0.810267480 -1.764674044  0.14413908
## FINANCE-COMMUNICATION                   -1.050488901 -2.269878679  0.16890088
## FOOD_AND_DRINK-COMMUNICATION             0.240054231 -1.481358681  1.96146714
## GAME-COMMUNICATION                       1.897466808  0.881322145  2.91361147
## HEALTH_AND_FITNESS-COMMUNICATION        -0.336389092 -1.612071429  0.93929324
## HOUSE_AND_HOME-COMMUNICATION             0.552920738 -1.468417868  2.57425934
## LIBRARIES_AND_DEMO-COMMUNICATION        -1.022174257 -2.943614732  0.89926622
## LIFESTYLE-COMMUNICATION                 -1.724334096 -2.924648445 -0.52401975
## MAPS_AND_NAVIGATION-COMMUNICATION       -0.086819238 -1.713535887  1.53989741
## MEDICAL-COMMUNICATION                   -3.684072556 -4.866054267 -2.50209085
## NEWS_AND_MAGAZINES-COMMUNICATION        -0.944767685 -2.264298100  0.37476273
## PARENTING-COMMUNICATION                 -0.060289699 -2.264333083  2.14375368
## PERSONALIZATION-COMMUNICATION           -1.480162219 -2.675319967 -0.28500447
## PHOTOGRAPHY-COMMUNICATION                1.838881123  0.554925245  3.12283700
## PRODUCTIVITY-COMMUNICATION              -0.569282118 -1.765895733  0.62733150
## SHOPPING-COMMUNICATION                   1.103151287 -0.307272802  2.51357538
## SOCIAL-COMMUNICATION                     0.054003296 -1.288254892  1.39626148
## SPORTS-COMMUNICATION                    -0.686471205 -1.923638516  0.55069611
## TOOLS-COMMUNICATION                     -0.500524025 -1.536525627  0.53547758
## TRAVEL_AND_LOCAL-COMMUNICATION           0.001353701 -1.375312662  1.37802006
## VIDEO_PLAYERS-COMMUNICATION              1.134217813 -0.375515633  2.64395126
## WEATHER-COMMUNICATION                    1.504252896 -0.464606467  3.47311226
## EDUCATION-DATING                         3.912103212  2.044163437  5.78004299
## ENTERTAINMENT-DATING                     5.309258307  3.351682499  7.26683412
## EVENTS-DATING                           -1.363676218 -3.656556058  0.92920362
## FAMILY-DATING                            0.928722668 -0.322443786  2.17988912
## FINANCE-DATING                           0.688501247 -0.774863724  2.15186622
## FOOD_AND_DRINK-DATING                    1.979044380  0.076996005  3.88109275
## GAME-DATING                              3.636456956  2.337582219  4.93533169
## HEALTH_AND_FITNESS-DATING                1.402601056 -0.107991884  2.91319400
## HOUSE_AND_HOME-DATING                    2.291910886  0.114679762  4.46914201
## LIBRARIES_AND_DEMO-DATING                0.716815891 -1.368000433  2.80163222
## LIFESTYLE-DATING                         0.014656052 -1.432852201  1.46216431
## MAPS_AND_NAVIGATION-DATING               1.652170910 -0.164620950  3.46896277
## MEDICAL-DATING                          -1.945082408 -3.377425374 -0.51273944
## NEWS_AND_MAGAZINES-DATING                0.794222463 -0.753577951  2.34202288
## PARENTING-DATING                         1.678700449 -0.669135329  4.02653623
## PERSONALIZATION-DATING                   0.258827929 -1.184407204  1.70206306
## PHOTOGRAPHY-DATING                       3.577871271  2.060284930  5.09545761
## PRODUCTIVITY-DATING                      1.169708030 -0.274732952  2.61414901
## SHOPPING-DATING                          2.842141435  1.216158127  4.46812474
## SOCIAL-DATING                            1.792993444  0.225772129  3.36021476
## SPORTS-DATING                            1.052518943 -0.425692328  2.53073021
## TOOLS-DATING                             1.238466123 -0.076001422  2.55293367
## TRAVEL_AND_LOCAL-DATING                  1.740343849  0.143554608  3.33713309
## VIDEO_PLAYERS-DATING                     2.873207961  1.160368014  4.58604791
## WEATHER-DATING                           3.243243044  1.114644257  5.37184183
## ENTERTAINMENT-EDUCATION                  1.397155095 -0.714183940  3.50849413
## EVENTS-EDUCATION                        -5.275779430 -7.701258236 -2.85030062
## FAMILY-EDUCATION                        -2.983380544 -4.463605856 -1.50315523
## FINANCE-EDUCATION                       -3.223601965 -4.887058161 -1.56014577
## FOOD_AND_DRINK-EDUCATION                -1.933058833 -3.993019390  0.12690172
## GAME-EDUCATION                          -0.275646256 -1.796410886  1.24511837
## HEALTH_AND_FITNESS-EDUCATION            -2.509502156 -4.214653325 -0.80435099
## HOUSE_AND_HOME-EDUCATION                -1.620192326 -3.936651876  0.69626722
## LIBRARIES_AND_DEMO-EDUCATION            -3.195287321 -5.425109866 -0.96546478
## LIFESTYLE-EDUCATION                     -3.897447160 -5.546971218 -2.24792310
## MAPS_AND_NAVIGATION-EDUCATION           -2.259932302 -4.241442367 -0.27842224
## MEDICAL-EDUCATION                       -5.857185621 -7.493417831 -4.22095341
## NEWS_AND_MAGAZINES-EDUCATION            -3.117880749 -4.856079702 -1.37968180
## PARENTING-EDUCATION                     -2.233402764 -4.710897934  0.24409241
## PERSONALIZATION-EDUCATION               -3.653275283 -5.299050822 -2.00749974
## PHOTOGRAPHY-EDUCATION                   -0.334231941 -2.045581636  1.37711775
## PRODUCTIVITY-EDUCATION                  -2.742395183 -4.389228273 -1.09556209
## SHOPPING-EDUCATION                      -1.069961777 -2.878129688  0.73820613
## SOCIAL-EDUCATION                        -2.119109768 -3.874624544 -0.36359499
## SPORTS-EDUCATION                        -2.859584269 -4.536115820 -1.18305272
## TOOLS-EDUCATION                         -2.673637089 -4.207740868 -1.13953331
## TRAVEL_AND_LOCAL-EDUCATION              -2.171759363 -3.953720462 -0.38979826
## VIDEO_PLAYERS-EDUCATION                 -1.038895252 -2.925551065  0.84776056
## WEATHER-EDUCATION                       -0.668860168 -2.939671097  1.60195076
## EVENTS-ENTERTAINMENT                    -6.672934525 -9.168100147 -4.17776890
## FAMILY-ENTERTAINMENT                    -4.380535640 -5.972380799 -2.78869048
## FINANCE-ENTERTAINMENT                   -4.620757060 -6.384273591 -2.85724053
## FOOD_AND_DRINK-ENTERTAINMENT            -3.330213928 -5.471788474 -1.18863938
## GAME-ENTERTAINMENT                      -1.672801351 -3.302411458 -0.04319124
## HEALTH_AND_FITNESS-ENTERTAINMENT        -3.906657251 -5.709556180 -2.10375832
## HOUSE_AND_HOME-ENTERTAINMENT            -3.017347422 -5.406675546 -0.62801930
## LIBRARIES_AND_DEMO-ENTERTAINMENT        -4.592442417 -6.897873531 -2.28701130
## LIFESTYLE-ENTERTAINMENT                 -5.294602255 -7.044983259 -3.54422125
## MAPS_AND_NAVIGATION-ENTERTAINMENT       -3.657087397 -5.723312501 -1.59086229
## MEDICAL-ENTERTAINMENT                   -7.254340716 -8.992201437 -5.51647999
## NEWS_AND_MAGAZINES-ENTERTAINMENT        -4.515035845 -6.349222214 -2.68084948
## PARENTING-ENTERTAINMENT                 -3.630557859 -6.176316362 -1.08479936
## PERSONALIZATION-ENTERTAINMENT           -5.050430378 -6.797279303 -3.30358145
## PHOTOGRAPHY-ENTERTAINMENT               -1.731387036 -3.540149547  0.07737547
## PRODUCTIVITY-ENTERTAINMENT              -4.139550278 -5.887395599 -2.39170496
## SHOPPING-ENTERTAINMENT                  -2.467116872 -4.367741831 -0.56649191
## SOCIAL-ENTERTAINMENT                    -3.516264864 -5.366869134 -1.66566059
## SPORTS-ENTERTAINMENT                    -4.256739364 -6.032594674 -2.48088405
## TOOLS-ENTERTAINMENT                     -4.070792184 -5.712857483 -2.42872689
## TRAVEL_AND_LOCAL-ENTERTAINMENT          -3.568914458 -5.444624830 -1.69320409
## VIDEO_PLAYERS-ENTERTAINMENT             -2.436050347 -4.411493132 -0.46060756
## WEATHER-ENTERTAINMENT                   -2.066015263 -4.411113623  0.27908310
## FAMILY-EVENTS                            2.292398886  0.302633591  4.28216418
## FINANCE-EVENTS                           2.052177465 -0.077420025  4.18177496
## FOOD_AND_DRINK-EVENTS                    3.342720597  0.890877117  5.79456408
## GAME-EVENTS                              5.000133174  2.980028222  7.02023813
## HEALTH_AND_FITNESS-EVENTS                2.766277274  0.603954573  4.92859998
## HOUSE_AND_HOME-EVENTS                    3.655587104  0.984616531  6.32655768
## LIBRARIES_AND_DEMO-EVENTS                2.080492109 -0.515698858  4.67668308
## LIFESTYLE-EVENTS                         1.378332270 -0.740400506  3.49706505
## MAPS_AND_NAVIGATION-EVENTS               3.015847128  0.629535938  5.40215832
## MEDICAL-EVENTS                          -0.581406190 -2.689807197  1.52699482
## NEWS_AND_MAGAZINES-EVENTS                2.157898681 -0.030578999  4.34637636
## PARENTING-EVENTS                         3.042376667  0.230601200  5.85415213
## PERSONALIZATION-EVENTS                   1.622504147 -0.493311554  3.73831985
## PHOTOGRAPHY-EVENTS                       4.941547489  2.774333440  7.10876154
## PRODUCTIVITY-EVENTS                      2.533384248  0.416745832  4.65002266
## SHOPPING-EVENTS                          4.205817653  1.961364456  6.45027085
## SOCIAL-EVENTS                            3.156669662  0.954413754  5.35892557
## SPORTS-EVENTS                            2.416195161  0.276368767  4.55602156
## TOOLS-EVENTS                             2.602142341  0.571976496  4.63230819
## TRAVEL_AND_LOCAL-EVENTS                  3.104020067  0.880625295  5.32741484
## VIDEO_PLAYERS-EVENTS                     4.236884179  1.928731429  6.54503693
## WEATHER-EVENTS                           4.606919262  1.975440356  7.23839817
## FINANCE-FAMILY                          -0.240221421 -1.158537818  0.67809498
## FOOD_AND_DRINK-FAMILY                    1.050321712 -0.472719978  2.57336340
## GAME-FAMILY                              2.707734288  2.084080899  3.33138768
## HEALTH_AND_FITNESS-FAMILY                0.473878388 -0.517967869  1.46572465
## HOUSE_AND_HOME-FAMILY                    1.363188218 -0.492129365  3.21850580
## LIBRARIES_AND_DEMO-FAMILY               -0.211906777 -1.957852673  1.53403912
## LIFESTYLE-FAMILY                        -0.914066616 -1.806898128 -0.02123510
## MAPS_AND_NAVIGATION-FAMILY               0.723448243 -0.691684483  2.13858097
## MEDICAL-FAMILY                          -2.873805076 -3.741834072 -2.00577608
## NEWS_AND_MAGAZINES-FAMILY               -0.134500205 -1.182142362  0.91314195
## PARENTING-FAMILY                         0.749977781 -1.302874091  2.80282965
## PERSONALIZATION-FAMILY                  -0.669894739 -1.555781648  0.21599217
## PHOTOGRAPHY-FAMILY                       2.649148604  1.646683506  3.65161370
## PRODUCTIVITY-FAMILY                      0.240985362 -0.646864692  1.12883542
## SHOPPING-FAMILY                          1.913418767  0.753380110  3.07345742
## SOCIAL-FAMILY                            0.864270776 -0.211856827  1.94039838
## SPORTS-FAMILY                            0.123796275 -0.817998039  1.06559059
## TOOLS-FAMILY                             0.309743455 -0.345766190  0.96525310
## TRAVEL_AND_LOCAL-FAMILY                  0.811621182 -0.307129840  1.93037220
## VIDEO_PLAYERS-FAMILY                     1.944485293  0.665544853  3.22342573
## WEATHER-FAMILY                           2.314520376  0.516521276  4.11251948
## FOOD_AND_DRINK-FINANCE                   1.290543132 -0.411125323  2.99221159
## GAME-FINANCE                             2.947955709  1.965630359  3.93028106
## HEALTH_AND_FITNESS-FINANCE               0.714099809 -0.534811133  1.96301075
## HOUSE_AND_HOME-FINANCE                   1.603409639 -0.401140904  3.60796018
## LIBRARIES_AND_DEMO-FINANCE               0.028314644 -1.875457038  1.93208633
## LIFESTYLE-FINANCE                       -0.673845195 -1.845667559  0.49797717
## MAPS_AND_NAVIGATION-FINANCE              0.963669663 -0.642138598  2.56947792
## MEDICAL-FINANCE                         -2.633583656 -3.786620464 -1.48054685
## NEWS_AND_MAGAZINES-FINANCE               0.105721216 -1.187945523  1.39938795
## PARENTING-FINANCE                        0.990199201 -1.198458001  3.17885640
## PERSONALIZATION-FINANCE                 -0.429673318 -1.596213141  0.73686651
## PHOTOGRAPHY-FINANCE                      2.889370024  1.632009370  4.14673068
## PRODUCTIVITY-FINANCE                     0.481206782 -0.686824578  1.64923814
## SHOPPING-FINANCE                         2.153640188  0.767382914  3.53989746
## SOCIAL-FINANCE                           1.104492197 -0.212348781  2.42133317
## SPORTS-FINANCE                           0.364017696 -0.845526050  1.57356144
## TOOLS-FINANCE                            0.549964876 -0.452887275  1.55281703
## TRAVEL_AND_LOCAL-FINANCE                 1.051842602 -0.300053619  2.40373882
## VIDEO_PLAYERS-FINANCE                    2.184706713  0.697525422  3.67188800
## WEATHER-FINANCE                          2.554741797  0.603121878  4.50636172
## GAME-FOOD_AND_DRINK                      1.657412577  0.094942073  3.21988308
## HEALTH_AND_FITNESS-FOOD_AND_DRINK       -0.576443324 -2.318892612  1.16600596
## HOUSE_AND_HOME-FOOD_AND_DRINK            0.312866506 -2.031184231  2.65691724
## LIBRARIES_AND_DEMO-FOOD_AND_DRINK       -1.262228489 -3.520700892  0.99624391
## LIFESTYLE-FOOD_AND_DRINK                -1.964388327 -3.652440055 -0.27633660
## MAPS_AND_NAVIGATION-FOOD_AND_DRINK      -0.326873469 -2.340569363  1.68682242
## MEDICAL-FOOD_AND_DRINK                  -3.924126788 -5.599192417 -2.24906116
## NEWS_AND_MAGAZINES-FOOD_AND_DRINK       -1.184821917 -2.959624613  0.58998078
## PARENTING-FOOD_AND_DRINK                -0.300343931 -2.803656003  2.20296814
## PERSONALIZATION-FOOD_AND_DRINK          -1.720216450 -3.404605402 -0.03582750
## PHOTOGRAPHY-FOOD_AND_DRINK               1.598826892 -0.149688706  3.34734249
## PRODUCTIVITY-FOOD_AND_DRINK             -0.809336350 -2.494758624  0.87608592
## SHOPPING-FOOD_AND_DRINK                  0.863097055 -0.980285761  2.70647987
## SOCIAL-FOOD_AND_DRINK                   -0.186050936 -1.977815748  1.60571388
## SPORTS-FOOD_AND_DRINK                   -0.926525436 -2.640977843  0.78792697
## TOOLS-FOOD_AND_DRINK                    -0.740578256 -2.316034831  0.83487832
## TRAVEL_AND_LOCAL-FOOD_AND_DRINK         -0.238700530 -2.056384324  1.57898326
## VIDEO_PLAYERS-FOOD_AND_DRINK             0.894163581 -1.026268447  2.81459561
## WEATHER-FOOD_AND_DRINK                   1.264198665 -1.034751376  3.56314871
## HEALTH_AND_FITNESS-GAME                 -2.233855900 -3.285244070 -1.18246773
## HOUSE_AND_HOME-GAME                     -1.344546070 -3.232365294  0.54327315
## LIBRARIES_AND_DEMO-GAME                 -2.919641065 -4.700086279 -1.13919585
## LIFESTYLE-GAME                          -3.621800904 -4.580344694 -2.66325711
## MAPS_AND_NAVIGATION-GAME                -1.984286046 -3.441769739 -0.52680235
## MEDICAL-GAME                            -5.581539364 -6.517024493 -4.64605424
## NEWS_AND_MAGAZINES-GAME                 -2.842234493 -3.946413906 -1.73805508
## PARENTING-GAME                          -1.957756507 -4.040029039  0.12451602
## PERSONALIZATION-GAME                    -3.377629027 -4.329707649 -2.42555040
## PHOTOGRAPHY-GAME                        -0.058585685 -1.119997178  1.00282581
## PRODUCTIVITY-GAME                       -2.466748926 -3.420654479 -1.51284337
## SHOPPING-GAME                           -0.794315521 -2.005656832  0.41702579
## SOCIAL-GAME                             -1.843463512 -2.974705619 -0.71222141
## SPORTS-GAME                             -2.583938013 -3.588246042 -1.57962998
## TOOLS-GAME                              -2.397990833 -3.140520954 -1.65546071
## TRAVEL_AND_LOCAL-GAME                   -1.896113107 -3.067975699 -0.72425051
## VIDEO_PLAYERS-GAME                      -0.763248995 -2.088898548  0.56240056
## WEATHER-GAME                            -0.393213912 -2.224732094  1.43830427
## HOUSE_AND_HOME-HEALTH_AND_FITNESS        0.889309830 -1.149973591  2.92859325
## LIBRARIES_AND_DEMO-HEALTH_AND_FITNESS   -0.685785165 -2.626094575  1.25452425
## LIFESTYLE-HEALTH_AND_FITNESS            -1.387945004 -2.618238320 -0.15765169
## MAPS_AND_NAVIGATION-HEALTH_AND_FITNESS   0.249569854 -1.399391687  1.89853140
## MEDICAL-HEALTH_AND_FITNESS              -3.347683464 -4.560097532 -2.13526940
## NEWS_AND_MAGAZINES-HEALTH_AND_FITNESS   -0.608378593 -1.955237020  0.73847983
## PARENTING-HEALTH_AND_FITNESS             0.276099393 -1.944412788  2.49661157
## PERSONALIZATION-HEALTH_AND_FITNESS      -1.143773127 -2.369036017  0.08148976
## PHOTOGRAPHY-HEALTH_AND_FITNESS           2.175270215  0.863245137  3.48729529
## PRODUCTIVITY-HEALTH_AND_FITNESS         -0.232893026 -1.459576055  0.99379000
## SHOPPING-HEALTH_AND_FITNESS              1.439540379  0.003516974  2.87556378
## SOCIAL-HEALTH_AND_FITNESS                0.390392388 -0.978740240  1.75952502
## SPORTS-HEALTH_AND_FITNESS               -0.350082113 -1.616356201  0.91619197
## TOOLS-HEALTH_AND_FITNESS                -0.164134933 -1.234726558  0.90645669
## TRAVEL_AND_LOCAL-HEALTH_AND_FITNESS      0.337742793 -1.065139020  1.74062461
## VIDEO_PLAYERS-HEALTH_AND_FITNESS         1.470606905 -0.063069131  3.00428294
## WEATHER-HEALTH_AND_FITNESS               1.840641988 -0.146636125  3.82792010
## LIBRARIES_AND_DEMO-HOUSE_AND_HOME       -1.575094995 -4.069738245  0.91954825
## LIFESTYLE-HOUSE_AND_HOME                -2.277254834 -4.270259094 -0.28425057
## MAPS_AND_NAVIGATION-HOUSE_AND_HOME      -0.639739976 -2.915156092  1.63567614
## MEDICAL-HOUSE_AND_HOME                  -4.236993294 -6.219010501 -2.25497609
## NEWS_AND_MAGAZINES-HOUSE_AND_HOME       -1.497688423 -3.564684301  0.56930746
## PARENTING-HOUSE_AND_HOME                -0.613210437 -3.331503777  2.10508290
## PERSONALIZATION-HOUSE_AND_HOME          -2.033082957 -4.022985841 -0.04318007
## PHOTOGRAPHY-HOUSE_AND_HOME               1.285960385 -0.758508774  3.33042954
## PRODUCTIVITY-HOUSE_AND_HOME             -1.122202856 -3.112980490  0.86857478
## SHOPPING-HOUSE_AND_HOME                  0.550230549 -1.575941494  2.67640259
## SOCIAL-HOUSE_AND_HOME                   -0.498917442 -2.580495807  1.58266092
## SPORTS-HOUSE_AND_HOME                   -1.239391943 -3.254806144  0.77602226
## TOOLS-HOUSE_AND_HOME                    -1.053444763 -2.952026014  0.84513649
## TRAVEL_AND_LOCAL-HOUSE_AND_HOME         -0.551567036 -2.655497102  1.55236303
## VIDEO_PLAYERS-HOUSE_AND_HOME             0.581297075 -1.612012416  2.77460657
## WEATHER-HOUSE_AND_HOME                   0.951332158 -1.580015040  3.48267936
## LIFESTYLE-LIBRARIES_AND_DEMO            -0.702159839 -2.593770190  1.18945051
## MAPS_AND_NAVIGATION-LIBRARIES_AND_DEMO   0.935355019 -1.251798894  3.12250893
## MEDICAL-LIBRARIES_AND_DEMO              -2.661898299 -4.541929136 -0.78186746
## NEWS_AND_MAGAZINES-LIBRARIES_AND_DEMO    0.077406572 -1.892008493  2.04682164
## PARENTING-LIBRARIES_AND_DEMO             0.961884558 -1.682967502  3.60673662
## PERSONALIZATION-LIBRARIES_AND_DEMO      -0.457987962 -2.346330416  1.43035449
## PHOTOGRAPHY-LIBRARIES_AND_DEMO           2.861055380  0.915296433  4.80681433
## PRODUCTIVITY-LIBRARIES_AND_DEMO          0.452892139 -1.436372090  2.34215637
## SHOPPING-LIBRARIES_AND_DEMO              2.125325544  0.093889772  4.15676132
## SOCIAL-LIBRARIES_AND_DEMO                1.076177553 -0.908537093  3.06089220
## SPORTS-LIBRARIES_AND_DEMO                0.335703052 -1.579504023  2.25091013
## TOOLS-LIBRARIES_AND_DEMO                 0.521650232 -1.270202023  2.31350249
## TRAVEL_AND_LOCAL-LIBRARIES_AND_DEMO      1.023527959 -0.984616822  3.03167274
## VIDEO_PLAYERS-LIBRARIES_AND_DEMO         2.156392070  0.054790238  4.25799390
## WEATHER-LIBRARIES_AND_DEMO               2.526427153  0.074113475  4.97874083
## MAPS_AND_NAVIGATION-LIFESTYLE            1.637514858  0.046143350  3.22888637
## MEDICAL-LIFESTYLE                       -1.959738461 -3.092583104 -0.82689382
## NEWS_AND_MAGAZINES-LIFESTYLE             0.779566411 -0.496136037  2.05526886
## PARENTING-LIFESTYLE                      1.664044396 -0.514042713  3.84213151
## PERSONALIZATION-LIFESTYLE                0.244171877 -0.902413603  1.39075736
## PHOTOGRAPHY-LIFESTYLE                    3.563215219  2.324345202  4.80208524
## PRODUCTIVITY-LIFESTYLE                   1.155051977  0.006949036  2.30315492
## SHOPPING-LIFESTYLE                       2.827485383  1.457977321  4.19699344
## SOCIAL-LIFESTYLE                         1.778337392  0.479140230  3.07753455
## SPORTS-LIFESTYLE                         1.037862891 -0.152447647  2.22817343
## TOOLS-LIFESTYLE                          1.223810071  0.244241012  2.20337913
## TRAVEL_AND_LOCAL-LIFESTYLE               1.725687797  0.390971911  3.06040368
## VIDEO_PLAYERS-LIFESTYLE                  2.858551909  1.386970685  4.33013313
## WEATHER-LIFESTYLE                        3.228586992  1.288828395  5.16834559
## MEDICAL-MAPS_AND_NAVIGATION             -3.597253319 -5.174843096 -2.01966354
## NEWS_AND_MAGAZINES-MAPS_AND_NAVIGATION  -0.857948447 -2.541061413  0.82516452
## PARENTING-MAPS_AND_NAVIGATION            0.026529538 -2.412633428  2.46569250
## PERSONALIZATION-MAPS_AND_NAVIGATION     -1.393342981 -2.980828661  0.19414270
## PHOTOGRAPHY-MAPS_AND_NAVIGATION          1.925700361  0.270329877  3.58107084
## PRODUCTIVITY-MAPS_AND_NAVIGATION        -0.482462881 -2.071044916  1.10611915
## SHOPPING-MAPS_AND_NAVIGATION             1.189970525 -0.565308613  2.94524966
## SOCIAL-MAPS_AND_NAVIGATION               0.140822534 -1.560167115  1.84181218
## SPORTS-MAPS_AND_NAVIGATION              -0.599651967 -2.219001126  1.01969719
## TOOLS-MAPS_AND_NAVIGATION               -0.413704787 -1.885101423  1.05769185
## TRAVEL_AND_LOCAL-MAPS_AND_NAVIGATION     0.088172939 -1.640097588  1.81644347
## VIDEO_PLAYERS-MAPS_AND_NAVIGATION        1.221037050 -0.614992311  3.05706641
## WEATHER-MAPS_AND_NAVIGATION              1.591072134 -0.637854947  3.81999921
## NEWS_AND_MAGAZINES-MEDICAL               2.739304871  1.480836372  3.99777337
## PARENTING-MEDICAL                        3.623782857  1.455744646  5.79182107
## PERSONALIZATION-MEDICAL                  2.203910338  1.076530858  3.33128982
## PHOTOGRAPHY-MEDICAL                      5.522953680  4.301837327  6.74407003
## PRODUCTIVITY-MEDICAL                     3.114790438  1.985867680  4.24371320
## SHOPPING-MEDICAL                         4.787223843  3.433754760  6.14069293
## SOCIAL-MEDICAL                           3.738075852  2.455796829  5.02035487
## SPORTS-MEDICAL                           2.997601351  1.825779946  4.16942276
## TOOLS-MEDICAL                            3.183548531  2.226531411  4.14056565
## TRAVEL_AND_LOCAL-MEDICAL                 3.685426258  2.367172593  5.00367992
## VIDEO_PLAYERS-MEDICAL                    4.818290369  3.361623789  6.27495695
## WEATHER-MEDICAL                          5.188325452  3.259857241  7.11679366
## PARENTING-NEWS_AND_MAGAZINES             0.884477986 -1.361511648  3.13046762
## PERSONALIZATION-NEWS_AND_MAGAZINES      -0.535394534 -1.806246312  0.73545724
## PHOTOGRAPHY-NEWS_AND_MAGAZINES           2.783648809  1.428951463  4.13834615
## PRODUCTIVITY-NEWS_AND_MAGAZINES          0.375485567 -0.896735461  1.64770659
## SHOPPING-NEWS_AND_MAGAZINES              2.047918972  0.572805990  3.52303195
## SOCIAL-NEWS_AND_MAGAZINES                0.998770981 -0.411306762  2.40884872
## SPORTS-NEWS_AND_MAGAZINES                0.258296480 -1.052140529  1.56873349
## TOOLS-NEWS_AND_MAGAZINES                 0.444243660 -0.678236415  1.56672374
## TRAVEL_AND_LOCAL-NEWS_AND_MAGAZINES      0.946121387 -0.496748141  2.38899091
## VIDEO_PLAYERS-NEWS_AND_MAGAZINES         2.078985498  0.508648822  3.64932217
## WEATHER-NEWS_AND_MAGAZINES               2.449020581  0.433314901  4.46472626
## PERSONALIZATION-PARENTING               -1.419872519 -3.595122152  0.75537711
## PHOTOGRAPHY-PARENTING                    1.899170823 -0.326104805  4.12444645
## PRODUCTIVITY-PARENTING                  -0.508992419 -2.685042295  1.66705746
## SHOPPING-PARENTING                       1.163440986 -1.137125255  3.46400723
## SOCIAL-PARENTING                         0.114292995 -2.145124178  2.37371017
## SPORTS-PARENTING                        -0.626181506 -2.824792857  1.57242985
## TOOLS-PARENTING                         -0.440234326 -2.532268800  1.65180015
## TRAVEL_AND_LOCAL-PARENTING               0.061643401 -2.218382738  2.34166954
## VIDEO_PLAYERS-PARENTING                  1.194507512 -1.168245966  3.55726099
## WEATHER-PARENTING                        1.564542595 -1.114956632  4.24404182
## PHOTOGRAPHY-PERSONALIZATION              3.319043342  2.085168784  4.55291790
## PRODUCTIVITY-PERSONALIZATION             0.910880100 -0.231830655  2.05359086
## SHOPPING-PERSONALIZATION                 2.583313506  1.218322722  3.94830429
## SOCIAL-PERSONALIZATION                   1.534165515  0.239730978  2.82860005
## SPORTS-PERSONALIZATION                   0.793691014 -0.391419396  1.97880142
## TOOLS-PERSONALIZATION                    0.979638194  0.006394623  1.95288176
## TRAVEL_AND_LOCAL-PERSONALIZATION         1.481515920  0.151435471  2.81159637
## VIDEO_PLAYERS-PERSONALIZATION            2.614380031  1.147001824  4.08175824
## WEATHER-PERSONALIZATION                  2.984415115  1.047843165  4.92098706
## PRODUCTIVITY-PHOTOGRAPHY                -2.408163242 -3.643448037 -1.17287845
## SHOPPING-PHOTOGRAPHY                    -0.735729836 -2.179107990  0.70764832
## SOCIAL-PHOTOGRAPHY                      -1.784877827 -3.161722563 -0.40803309
## SPORTS-PHOTOGRAPHY                      -2.525352328 -3.799961028 -1.25074363
## TOOLS-PHOTOGRAPHY                       -2.339405148 -3.419841959 -1.25896834
## TRAVEL_AND_LOCAL-PHOTOGRAPHY            -1.837527422 -3.247936815 -0.42711803
## VIDEO_PLAYERS-PHOTOGRAPHY               -0.704663311 -2.245227966  0.83590134
## WEATHER-PHOTOGRAPHY                     -0.334628227 -2.327227428  1.65797097
## SHOPPING-PRODUCTIVITY                    1.672433405  0.306167714  3.03869910
## SOCIAL-PRODUCTIVITY                      0.623285414 -0.672493453  1.91906428
## SPORTS-PRODUCTIVITY                     -0.117189087 -1.303767691  1.06938952
## TOOLS-PRODUCTIVITY                       0.068758093 -0.906272751  1.04378894
## TRAVEL_AND_LOCAL-PRODUCTIVITY            0.570635820 -0.760752968  1.90202461
## VIDEO_PLAYERS-PRODUCTIVITY               1.703499931  0.234935699  3.17206416
## WEATHER-PRODUCTIVITY                     2.073535014  0.136064235  4.01100579
## SOCIAL-SHOPPING                         -1.049147991 -2.544626122  0.44633014
## SPORTS-SHOPPING                         -1.789622492 -3.191542872 -0.38770211
## TOOLS-SHOPPING                          -1.603675312 -2.831721368 -0.37562926
## TRAVEL_AND_LOCAL-SHOPPING               -1.101797586 -2.628233982  0.42463881
## VIDEO_PLAYERS-SHOPPING                   0.031066526 -1.616383890  1.67851694
## WEATHER-SHOPPING                         0.401101609 -1.675242531  2.47744575
## SPORTS-SOCIAL                           -0.740474501 -2.073794299  0.59284530
## TOOLS-SOCIAL                            -0.554527321 -1.703639173  0.59458453
## TRAVEL_AND_LOCAL-SOCIAL                 -0.052649594 -1.516332963  1.41103377
## VIDEO_PLAYERS-SOCIAL                     1.080214517 -0.509267727  2.66969676
## WEATHER-SOCIAL                           1.450249600 -0.580406924  3.48090612
## TOOLS-SPORTS                             0.185947180 -0.838447257  1.21034162
## TRAVEL_AND_LOCAL-SPORTS                  0.687824906 -0.680127913  2.05577773
## VIDEO_PLAYERS-SPORTS                     1.820689018  0.318896849  3.32248119
## WEATHER-SPORTS                           2.190724101  0.227947539  4.15350066
## TRAVEL_AND_LOCAL-TOOLS                   0.501877726 -0.687244336  1.69099979
## VIDEO_PLAYERS-TOOLS                      1.634741838  0.293810788  2.97567289
## WEATHER-TOOLS                            2.004776921  0.162167848  3.84738599
## VIDEO_PLAYERS-TRAVEL_AND_LOCAL           1.132864111 -0.485779467  2.75150769
## WEATHER-TRAVEL_AND_LOCAL                 1.502899195 -0.550663356  3.55646175
## WEATHER-VIDEO_PLAYERS                    0.370035083 -1.775006630  2.51507680
##                                             p adj
## AUTO_AND_VEHICLES-ART_AND_DESIGN        0.9905770
## BEAUTY-ART_AND_DESIGN                   1.0000000
## BOOKS_AND_REFERENCE-ART_AND_DESIGN      0.7650853
## BUSINESS-ART_AND_DESIGN                 0.0000002
## COMICS-ART_AND_DESIGN                   1.0000000
## COMMUNICATION-ART_AND_DESIGN            1.0000000
## DATING-ART_AND_DESIGN                   0.5288198
## EDUCATION-ART_AND_DESIGN                0.1681188
## ENTERTAINMENT-ART_AND_DESIGN            0.0000287
## EVENTS-ART_AND_DESIGN                   0.0087638
## FAMILY-ART_AND_DESIGN                   0.9998655
## FINANCE-ART_AND_DESIGN                  0.9953429
## FOOD_AND_DRINK-ART_AND_DESIGN           1.0000000
## GAME-ART_AND_DESIGN                     0.1044645
## HEALTH_AND_FITNESS-ART_AND_DESIGN       1.0000000
## HOUSE_AND_HOME-ART_AND_DESIGN           1.0000000
## LIBRARIES_AND_DEMO-ART_AND_DESIGN       0.9999327
## LIFESTYLE-ART_AND_DESIGN                0.3599194
## MAPS_AND_NAVIGATION-ART_AND_DESIGN      1.0000000
## MEDICAL-ART_AND_DESIGN                  0.0000000
## NEWS_AND_MAGAZINES-ART_AND_DESIGN       0.9995685
## PARENTING-ART_AND_DESIGN                1.0000000
## PERSONALIZATION-ART_AND_DESIGN          0.7130524
## PHOTOGRAPHY-ART_AND_DESIGN              0.2657651
## PRODUCTIVITY-ART_AND_DESIGN             1.0000000
## SHOPPING-ART_AND_DESIGN                 0.9954025
## SOCIAL-ART_AND_DESIGN                   1.0000000
## SPORTS-ART_AND_DESIGN                   0.9999994
## TOOLS-ART_AND_DESIGN                    1.0000000
## TRAVEL_AND_LOCAL-ART_AND_DESIGN         1.0000000
## VIDEO_PLAYERS-ART_AND_DESIGN            0.9954212
## WEATHER-ART_AND_DESIGN                  0.9595914
## BEAUTY-AUTO_AND_VEHICLES                1.0000000
## BOOKS_AND_REFERENCE-AUTO_AND_VEHICLES   1.0000000
## BUSINESS-AUTO_AND_VEHICLES              0.0061882
## COMICS-AUTO_AND_VEHICLES                0.9621172
## COMMUNICATION-AUTO_AND_VEHICLES         0.7147198
## DATING-AUTO_AND_VEHICLES                1.0000000
## EDUCATION-AUTO_AND_VEHICLES             0.0000011
## ENTERTAINMENT-AUTO_AND_VEHICLES         0.0000000
## EVENTS-AUTO_AND_VEHICLES                0.7596794
## FAMILY-AUTO_AND_VEHICLES                0.9999998
## FINANCE-AUTO_AND_VEHICLES               1.0000000
## FOOD_AND_DRINK-AUTO_AND_VEHICLES        0.7097936
## GAME-AUTO_AND_VEHICLES                  0.0000000
## HEALTH_AND_FITNESS-AUTO_AND_VEHICLES    0.9899650
## HOUSE_AND_HOME-AUTO_AND_VEHICLES        0.5246592
## LIBRARIES_AND_DEMO-AUTO_AND_VEHICLES    1.0000000
## LIFESTYLE-AUTO_AND_VEHICLES             1.0000000
## MAPS_AND_NAVIGATION-AUTO_AND_VEHICLES   0.9589334
## MEDICAL-AUTO_AND_VEHICLES               0.0009049
## NEWS_AND_MAGAZINES-AUTO_AND_VEHICLES    1.0000000
## PARENTING-AUTO_AND_VEHICLES             0.9965941
## PERSONALIZATION-AUTO_AND_VEHICLES       1.0000000
## PHOTOGRAPHY-AUTO_AND_VEHICLES           0.0000003
## PRODUCTIVITY-AUTO_AND_VEHICLES          0.9998582
## SHOPPING-AUTO_AND_VEHICLES              0.0021874
## SOCIAL-AUTO_AND_VEHICLES                0.7003399
## SPORTS-AUTO_AND_VEHICLES                0.9999973
## TOOLS-AUTO_AND_VEHICLES                 0.9980023
## TRAVEL_AND_LOCAL-AUTO_AND_VEHICLES      0.7962716
## VIDEO_PLAYERS-AUTO_AND_VEHICLES         0.0033886
## WEATHER-AUTO_AND_VEHICLES               0.0046025
## BOOKS_AND_REFERENCE-BEAUTY              0.9999996
## BUSINESS-BEAUTY                         0.0030178
## COMICS-BEAUTY                           0.9999995
## COMMUNICATION-BEAUTY                    0.9999991
## DATING-BEAUTY                           0.9999057
## EDUCATION-BEAUTY                        0.0071951
## ENTERTAINMENT-BEAUTY                    0.0000003
## EVENTS-BEAUTY                           0.3761557
## FAMILY-BEAUTY                           1.0000000
## FINANCE-BEAUTY                          1.0000000
## FOOD_AND_DRINK-BEAUTY                   0.9999638
## GAME-BEAUTY                             0.0023255
## HEALTH_AND_FITNESS-BEAUTY               1.0000000
## HOUSE_AND_HOME-BEAUTY                   0.9982507
## LIBRARIES_AND_DEMO-BEAUTY               1.0000000
## LIFESTYLE-BEAUTY                        0.9997246
## MAPS_AND_NAVIGATION-BEAUTY              1.0000000
## MEDICAL-BEAUTY                          0.0005851
## NEWS_AND_MAGAZINES-BEAUTY               1.0000000
## PARENTING-BEAUTY                        1.0000000
## PERSONALIZATION-BEAUTY                  0.9999996
## PHOTOGRAPHY-BEAUTY                      0.0108336
## PRODUCTIVITY-BEAUTY                     1.0000000
## SHOPPING-BEAUTY                         0.4900368
## SOCIAL-BEAUTY                           0.9999972
## SPORTS-BEAUTY                           1.0000000
## TOOLS-BEAUTY                            1.0000000
## TRAVEL_AND_LOCAL-BEAUTY                 0.9999995
## VIDEO_PLAYERS-BEAUTY                    0.5087479
## WEATHER-BEAUTY                          0.3585261
## BUSINESS-BOOKS_AND_REFERENCE            0.0000062
## COMICS-BOOKS_AND_REFERENCE              0.6153117
## COMMUNICATION-BOOKS_AND_REFERENCE       0.0116424
## DATING-BOOKS_AND_REFERENCE              1.0000000
## EDUCATION-BOOKS_AND_REFERENCE           0.0000000
## ENTERTAINMENT-BOOKS_AND_REFERENCE       0.0000000
## EVENTS-BOOKS_AND_REFERENCE              0.6598727
## FAMILY-BOOKS_AND_REFERENCE              0.8791930
## FINANCE-BOOKS_AND_REFERENCE             0.9999968
## FOOD_AND_DRINK-BOOKS_AND_REFERENCE      0.0766260
## GAME-BOOKS_AND_REFERENCE                0.0000000
## HEALTH_AND_FITNESS-BOOKS_AND_REFERENCE  0.2849977
## HOUSE_AND_HOME-BOOKS_AND_REFERENCE      0.0620317
## LIBRARIES_AND_DEMO-BOOKS_AND_REFERENCE  1.0000000
## LIFESTYLE-BOOKS_AND_REFERENCE           1.0000000
## MAPS_AND_NAVIGATION-BOOKS_AND_REFERENCE 0.3224421
## MEDICAL-BOOKS_AND_REFERENCE             0.0000002
## NEWS_AND_MAGAZINES-BOOKS_AND_REFERENCE  0.9999271
## PARENTING-BOOKS_AND_REFERENCE           0.8665016
## PERSONALIZATION-BOOKS_AND_REFERENCE     1.0000000
## PHOTOGRAPHY-BOOKS_AND_REFERENCE         0.0000000
## PRODUCTIVITY-BOOKS_AND_REFERENCE        0.6761301
## SHOPPING-BOOKS_AND_REFERENCE            0.0000000
## SOCIAL-BOOKS_AND_REFERENCE              0.0179804
## SPORTS-BOOKS_AND_REFERENCE              0.9208618
## TOOLS-BOOKS_AND_REFERENCE               0.2515531
## TRAVEL_AND_LOCAL-BOOKS_AND_REFERENCE    0.0407043
## VIDEO_PLAYERS-BOOKS_AND_REFERENCE       0.0000003
## WEATHER-BOOKS_AND_REFERENCE             0.0000123
## COMICS-BUSINESS                         0.0000002
## COMMUNICATION-BUSINESS                  0.0000000
## DATING-BUSINESS                         0.0018729
## EDUCATION-BUSINESS                      0.0000000
## ENTERTAINMENT-BUSINESS                  0.0000000
## EVENTS-BUSINESS                         1.0000000
## FAMILY-BUSINESS                         0.0000000
## FINANCE-BUSINESS                        0.0000000
## FOOD_AND_DRINK-BUSINESS                 0.0000000
## GAME-BUSINESS                           0.0000000
## HEALTH_AND_FITNESS-BUSINESS             0.0000000
## HOUSE_AND_HOME-BUSINESS                 0.0000000
## LIBRARIES_AND_DEMO-BUSINESS             0.0003603
## LIFESTYLE-BUSINESS                      0.0000018
## MAPS_AND_NAVIGATION-BUSINESS            0.0000000
## MEDICAL-BUSINESS                        1.0000000
## NEWS_AND_MAGAZINES-BUSINESS             0.0000000
## PARENTING-BUSINESS                      0.0000012
## PERSONALIZATION-BUSINESS                0.0000000
## PHOTOGRAPHY-BUSINESS                    0.0000000
## PRODUCTIVITY-BUSINESS                   0.0000000
## SHOPPING-BUSINESS                       0.0000000
## SOCIAL-BUSINESS                         0.0000000
## SPORTS-BUSINESS                         0.0000000
## TOOLS-BUSINESS                          0.0000000
## TRAVEL_AND_LOCAL-BUSINESS               0.0000000
## VIDEO_PLAYERS-BUSINESS                  0.0000000
## WEATHER-BUSINESS                        0.0000000
## COMMUNICATION-COMICS                    1.0000000
## DATING-COMICS                           0.3865153
## EDUCATION-COMICS                        0.4614186
## ENTERTAINMENT-COMICS                    0.0004270
## EVENTS-COMICS                           0.0052979
## FAMILY-COMICS                           0.9974780
## FINANCE-COMICS                          0.9735980
## FOOD_AND_DRINK-COMICS                   1.0000000
## GAME-COMICS                             0.4272084
## HEALTH_AND_FITNESS-COMICS               1.0000000
## HOUSE_AND_HOME-COMICS                   1.0000000
## LIBRARIES_AND_DEMO-COMICS               0.9989513
## LIFESTYLE-COMICS                        0.2487601
## MAPS_AND_NAVIGATION-COMICS              1.0000000
## MEDICAL-COMICS                          0.0000000
## NEWS_AND_MAGAZINES-COMICS               0.9950291
## PARENTING-COMICS                        1.0000000
## PERSONALIZATION-COMICS                  0.5581835
## PHOTOGRAPHY-COMICS                      0.6589917
## PRODUCTIVITY-COMICS                     0.9999970
## SHOPPING-COMICS                         0.9999610
## SOCIAL-COMICS                           1.0000000
## SPORTS-COMICS                           0.9999416
## TOOLS-COMICS                            0.9999992
## TRAVEL_AND_LOCAL-COMICS                 1.0000000
## VIDEO_PLAYERS-COMICS                    0.9999533
## WEATHER-COMICS                          0.9971709
## DATING-COMMUNICATION                    0.0039840
## EDUCATION-COMMUNICATION                 0.0004714
## ENTERTAINMENT-COMMUNICATION             0.0000000
## EVENTS-COMMUNICATION                    0.0000211
## FAMILY-COMMUNICATION                    0.2663033
## FINANCE-COMMUNICATION                   0.2375070
## FOOD_AND_DRINK-COMMUNICATION            1.0000000
## GAME-COMMUNICATION                      0.0000000
## HEALTH_AND_FITNESS-COMMUNICATION        1.0000000
## HOUSE_AND_HOME-COMMUNICATION            1.0000000
## LIBRARIES_AND_DEMO-COMMUNICATION        0.9848282
## LIFESTYLE-COMMUNICATION                 0.0000258
## MAPS_AND_NAVIGATION-COMMUNICATION       1.0000000
## MEDICAL-COMMUNICATION                   0.0000000
## NEWS_AND_MAGAZINES-COMMUNICATION        0.6613765
## PARENTING-COMMUNICATION                 1.0000000
## PERSONALIZATION-COMMUNICATION           0.0012258
## PHOTOGRAPHY-COMMUNICATION               0.0000283
## PRODUCTIVITY-COMMUNICATION              0.9973931
## SHOPPING-COMMUNICATION                  0.4519509
## SOCIAL-COMMUNICATION                    1.0000000
## SPORTS-COMMUNICATION                    0.9728222
## TOOLS-COMMUNICATION                     0.9966165
## TRAVEL_AND_LOCAL-COMMUNICATION          1.0000000
## VIDEO_PLAYERS-COMMUNICATION             0.5496141
## WEATHER-COMMUNICATION                   0.5088768
## EDUCATION-DATING                        0.0000000
## ENTERTAINMENT-DATING                    0.0000000
## EVENTS-DATING                           0.9358509
## FAMILY-DATING                           0.5783899
## FINANCE-DATING                          0.9978471
## FOOD_AND_DRINK-DATING                   0.0289571
## GAME-DATING                             0.0000000
## HEALTH_AND_FITNESS-DATING               0.1191131
## HOUSE_AND_HOME-DATING                   0.0243873
## LIBRARIES_AND_DEMO-DATING               0.9999966
## LIFESTYLE-DATING                        1.0000000
## MAPS_AND_NAVIGATION-DATING              0.1469049
## MEDICAL-DATING                          0.0001288
## NEWS_AND_MAGAZINES-DATING               0.9911170
## PARENTING-DATING                        0.6644251
## PERSONALIZATION-DATING                  1.0000000
## PHOTOGRAPHY-DATING                      0.0000000
## PRODUCTIVITY-DATING                     0.3693932
## SHOPPING-DATING                         0.0000000
## SOCIAL-DATING                           0.0060913
## SPORTS-DATING                           0.6735607
## TOOLS-DATING                            0.1019151
## TRAVEL_AND_LOCAL-DATING                 0.0141499
## VIDEO_PLAYERS-DATING                    0.0000001
## WEATHER-DATING                          0.0000039
## ENTERTAINMENT-EDUCATION                 0.8131444
## EVENTS-EDUCATION                        0.0000000
## FAMILY-EDUCATION                        0.0000000
## FINANCE-EDUCATION                       0.0000000
## FOOD_AND_DRINK-EDUCATION                0.1064637
## GAME-EDUCATION                          1.0000000
## HEALTH_AND_FITNESS-EDUCATION            0.0000122
## HOUSE_AND_HOME-EDUCATION                0.7113674
## LIBRARIES_AND_DEMO-EDUCATION            0.0000278
## LIFESTYLE-EDUCATION                     0.0000000
## MAPS_AND_NAVIGATION-EDUCATION           0.0064487
## MEDICAL-EDUCATION                       0.0000000
## NEWS_AND_MAGAZINES-EDUCATION            0.0000000
## PARENTING-EDUCATION                     0.1597553
## PERSONALIZATION-EDUCATION               0.0000000
## PHOTOGRAPHY-EDUCATION                   1.0000000
## PRODUCTIVITY-EDUCATION                  0.0000001
## SHOPPING-EDUCATION                      0.9394882
## SOCIAL-EDUCATION                        0.0021257
## SPORTS-EDUCATION                        0.0000001
## TOOLS-EDUCATION                         0.0000000
## TRAVEL_AND_LOCAL-EDUCATION              0.0017366
## VIDEO_PLAYERS-EDUCATION                 0.9754629
## WEATHER-EDUCATION                       0.9999999
## EVENTS-ENTERTAINMENT                    0.0000000
## FAMILY-ENTERTAINMENT                    0.0000000
## FINANCE-ENTERTAINMENT                   0.0000000
## FOOD_AND_DRINK-ENTERTAINMENT            0.0000019
## GAME-ENTERTAINMENT                      0.0351154
## HEALTH_AND_FITNESS-ENTERTAINMENT        0.0000000
## HOUSE_AND_HOME-ENTERTAINMENT            0.0007895
## LIBRARIES_AND_DEMO-ENTERTAINMENT        0.0000000
## LIFESTYLE-ENTERTAINMENT                 0.0000000
## MAPS_AND_NAVIGATION-ENTERTAINMENT       0.0000000
## MEDICAL-ENTERTAINMENT                   0.0000000
## NEWS_AND_MAGAZINES-ENTERTAINMENT        0.0000000
## PARENTING-ENTERTAINMENT                 0.0000321
## PERSONALIZATION-ENTERTAINMENT           0.0000000
## PHOTOGRAPHY-ENTERTAINMENT               0.0853659
## PRODUCTIVITY-ENTERTAINMENT              0.0000000
## SHOPPING-ENTERTAINMENT                  0.0004107
## SOCIAL-ENTERTAINMENT                    0.0000000
## SPORTS-ENTERTAINMENT                    0.0000000
## TOOLS-ENTERTAINMENT                     0.0000000
## TRAVEL_AND_LOCAL-ENTERTAINMENT          0.0000000
## VIDEO_PLAYERS-ENTERTAINMENT             0.0013470
## WEATHER-ENTERTAINMENT                   0.1968678
## FAMILY-EVENTS                           0.0053488
## FINANCE-EVENTS                          0.0790048
## FOOD_AND_DRINK-EVENTS                   0.0001158
## GAME-EVENTS                             0.0000000
## HEALTH_AND_FITNESS-EVENTS               0.0005831
## HOUSE_AND_HOME-EVENTS                   0.0001042
## LIBRARIES_AND_DEMO-EVENTS               0.3938637
## LIFESTYLE-EVENTS                        0.8392205
## MAPS_AND_NAVIGATION-EVENTS              0.0007756
## MEDICAL-EVENTS                          1.0000000
## NEWS_AND_MAGAZINES-EVENTS               0.0598395
## PARENTING-EVENTS                        0.0159190
## PERSONALIZATION-EVENTS                  0.4999117
## PHOTOGRAPHY-EVENTS                      0.0000000
## PRODUCTIVITY-EVENTS                     0.0025341
## SHOPPING-EVENTS                         0.0000000
## SOCIAL-EVENTS                           0.0000276
## SPORTS-EVENTS                           0.0077265
## TOOLS-EVENTS                            0.0005574
## TRAVEL_AND_LOCAL-EVENTS                 0.0000598
## VIDEO_PLAYERS-EVENTS                    0.0000000
## WEATHER-EVENTS                          0.0000000
## FINANCE-FAMILY                          1.0000000
## FOOD_AND_DRINK-FAMILY                   0.7396645
## GAME-FAMILY                             0.0000000
## HEALTH_AND_FITNESS-FAMILY               0.9971968
## HOUSE_AND_HOME-FAMILY                   0.6024533
## LIBRARIES_AND_DEMO-FAMILY               1.0000000
## LIFESTYLE-FAMILY                        0.0364395
## MAPS_AND_NAVIGATION-FAMILY              0.9916099
## MEDICAL-FAMILY                          0.0000000
## NEWS_AND_MAGAZINES-FAMILY               1.0000000
## PARENTING-FAMILY                        0.9999864
## PERSONALIZATION-FAMILY                  0.5338795
## PHOTOGRAPHY-FAMILY                      0.0000000
## PRODUCTIVITY-FAMILY                     1.0000000
## SHOPPING-FAMILY                         0.0000002
## SOCIAL-FAMILY                           0.3886807
## SPORTS-FAMILY                           1.0000000
## TOOLS-FAMILY                            0.9976804
## TRAVEL_AND_LOCAL-FAMILY                 0.6318131
## VIDEO_PLAYERS-FAMILY                    0.0000042
## WEATHER-FAMILY                          0.0005028
## FOOD_AND_DRINK-FINANCE                  0.5268048
## GAME-FINANCE                            0.0000000
## HEALTH_AND_FITNESS-FINANCE              0.9599754
## HOUSE_AND_HOME-FINANCE                  0.3982222
## LIBRARIES_AND_DEMO-FINANCE              1.0000000
## LIFESTYLE-FINANCE                       0.9570430
## MAPS_AND_NAVIGATION-FINANCE             0.9289711
## MEDICAL-FINANCE                         0.0000000
## NEWS_AND_MAGAZINES-FINANCE              1.0000000
## PARENTING-FINANCE                       0.9989280
## PERSONALIZATION-FINANCE                 0.9999836
## PHOTOGRAPHY-FINANCE                     0.0000000
## PRODUCTIVITY-FINANCE                    0.9998231
## SHOPPING-FINANCE                        0.0000020
## SOCIAL-FINANCE                          0.2913978
## SPORTS-FINANCE                          0.9999999
## TOOLS-FINANCE                           0.9767887
## TRAVEL_AND_LOCAL-FINANCE                0.4646598
## VIDEO_PLAYERS-FINANCE                   0.0000129
## WEATHER-FINANCE                         0.0003335
## GAME-FOOD_AND_DRINK                     0.0217208
## HEALTH_AND_FITNESS-FOOD_AND_DRINK       0.9999986
## HOUSE_AND_HOME-FOOD_AND_DRINK           1.0000000
## LIBRARIES_AND_DEMO-FOOD_AND_DRINK       0.9701107
## LIFESTYLE-FOOD_AND_DRINK                0.0044235
## MAPS_AND_NAVIGATION-FOOD_AND_DRINK      1.0000000
## MEDICAL-FOOD_AND_DRINK                  0.0000000
## NEWS_AND_MAGAZINES-FOOD_AND_DRINK       0.7987142
## PARENTING-FOOD_AND_DRINK                1.0000000
## PERSONALIZATION-FOOD_AND_DRINK          0.0377018
## PHOTOGRAPHY-FOOD_AND_DRINK              0.1391968
## PRODUCTIVITY-FOOD_AND_DRINK             0.9969458
## SHOPPING-FOOD_AND_DRINK                 0.9980216
## SOCIAL-FOOD_AND_DRINK                   1.0000000
## SPORTS-FOOD_AND_DRINK                   0.9810361
## TOOLS-FOOD_AND_DRINK                    0.9978802
## TRAVEL_AND_LOCAL-FOOD_AND_DRINK         1.0000000
## VIDEO_PLAYERS-FOOD_AND_DRINK            0.9982062
## WEATHER-FOOD_AND_DRINK                  0.9759116
## HEALTH_AND_FITNESS-GAME                 0.0000000
## HOUSE_AND_HOME-GAME                     0.6729496
## LIBRARIES_AND_DEMO-GAME                 0.0000003
## LIFESTYLE-GAME                          0.0000000
## MAPS_AND_NAVIGATION-GAME                0.0001202
## MEDICAL-GAME                            0.0000000
## NEWS_AND_MAGAZINES-GAME                 0.0000000
## PARENTING-GAME                          0.1042735
## PERSONALIZATION-GAME                    0.0000000
## PHOTOGRAPHY-GAME                        1.0000000
## PRODUCTIVITY-GAME                       0.0000000
## SHOPPING-GAME                           0.8274017
## SOCIAL-GAME                             0.0000003
## SPORTS-GAME                             0.0000000
## TOOLS-GAME                              0.0000000
## TRAVEL_AND_LOCAL-GAME                   0.0000004
## VIDEO_PLAYERS-GAME                      0.9563801
## WEATHER-GAME                            1.0000000
## HOUSE_AND_HOME-HEALTH_AND_FITNESS       0.9994599
## LIBRARIES_AND_DEMO-HEALTH_AND_FITNESS   0.9999936
## LIFESTYLE-HEALTH_AND_FITNESS            0.0078511
## MAPS_AND_NAVIGATION-HEALTH_AND_FITNESS  1.0000000
## MEDICAL-HEALTH_AND_FITNESS              0.0000000
## NEWS_AND_MAGAZINES-HEALTH_AND_FITNESS   0.9989587
## PARENTING-HEALTH_AND_FITNESS            1.0000000
## PERSONALIZATION-HEALTH_AND_FITNESS      0.1125977
## PHOTOGRAPHY-HEALTH_AND_FITNESS          0.0000002
## PRODUCTIVITY-HEALTH_AND_FITNESS         1.0000000
## SHOPPING-HEALTH_AND_FITNESS             0.0484269
## SOCIAL-HEALTH_AND_FITNESS               1.0000000
## SPORTS-HEALTH_AND_FITNESS               1.0000000
## TOOLS-HEALTH_AND_FITNESS                1.0000000
## TRAVEL_AND_LOCAL-HEALTH_AND_FITNESS     1.0000000
## VIDEO_PLAYERS-HEALTH_AND_FITNESS        0.0836878
## WEATHER-HEALTH_AND_FITNESS              0.1222167
## LIBRARIES_AND_DEMO-HOUSE_AND_HOME       0.8787444
## LIFESTYLE-HOUSE_AND_HOME                0.0062336
## MAPS_AND_NAVIGATION-HOUSE_AND_HOME      1.0000000
## MEDICAL-HOUSE_AND_HOME                  0.0000000
## NEWS_AND_MAGAZINES-HOUSE_AND_HOME       0.6346359
## PARENTING-HOUSE_AND_HOME                1.0000000
## PERSONALIZATION-HOUSE_AND_HOME          0.0374836
## PHOTOGRAPHY-HOUSE_AND_HOME              0.8832210
## PRODUCTIVITY-HOUSE_AND_HOME             0.9665871
## SHOPPING-HOUSE_AND_HOME                 1.0000000
## SOCIAL-HOUSE_AND_HOME                   1.0000000
## SPORTS-HOUSE_AND_HOME                   0.9073393
## TOOLS-HOUSE_AND_HOME                    0.9728316
## TRAVEL_AND_LOCAL-HOUSE_AND_HOME         1.0000000
## VIDEO_PLAYERS-HOUSE_AND_HOME            1.0000000
## WEATHER-HOUSE_AND_HOME                  0.9999745
## LIFESTYLE-LIBRARIES_AND_DEMO            0.9999805
## MAPS_AND_NAVIGATION-LIBRARIES_AND_DEMO  0.9996290
## MEDICAL-LIBRARIES_AND_DEMO              0.0000398
## NEWS_AND_MAGAZINES-LIBRARIES_AND_DEMO   1.0000000
## PARENTING-LIBRARIES_AND_DEMO            0.9999877
## PERSONALIZATION-LIBRARIES_AND_DEMO      1.0000000
## PHOTOGRAPHY-LIBRARIES_AND_DEMO          0.0000125
## PRODUCTIVITY-LIBRARIES_AND_DEMO         1.0000000
## SHOPPING-LIBRARIES_AND_DEMO             0.0267194
## SOCIAL-LIBRARIES_AND_DEMO               0.9801305
## SPORTS-LIBRARIES_AND_DEMO               1.0000000
## TOOLS-LIBRARIES_AND_DEMO                0.9999999
## TRAVEL_AND_LOCAL-LIBRARIES_AND_DEMO     0.9919899
## VIDEO_PLAYERS-LIBRARIES_AND_DEMO        0.0353235
## WEATHER-LIBRARIES_AND_DEMO              0.0333740
## MAPS_AND_NAVIGATION-LIFESTYLE           0.0339397
## MEDICAL-LIFESTYLE                       0.0000000
## NEWS_AND_MAGAZINES-LIFESTYLE            0.9133558
## PARENTING-LIFESTYLE                     0.5089691
## PERSONALIZATION-LIFESTYLE               1.0000000
## PHOTOGRAPHY-LIFESTYLE                   0.0000000
## PRODUCTIVITY-LIFESTYLE                  0.0461900
## SHOPPING-LIFESTYLE                      0.0000000
## SOCIAL-LIFESTYLE                        0.0001038
## SPORTS-LIFESTYLE                        0.2151106
## TOOLS-LIFESTYLE                         0.0010087
## TRAVEL_AND_LOCAL-LIFESTYLE              0.0004523
## VIDEO_PLAYERS-LIFESTYLE                 0.0000000
## WEATHER-LIFESTYLE                       0.0000001
## MEDICAL-MAPS_AND_NAVIGATION             0.0000000
## NEWS_AND_MAGAZINES-MAPS_AND_NAVIGATION  0.9919776
## PARENTING-MAPS_AND_NAVIGATION           1.0000000
## PERSONALIZATION-MAPS_AND_NAVIGATION     0.2033549
## PHOTOGRAPHY-MAPS_AND_NAVIGATION         0.0044525
## PRODUCTIVITY-MAPS_AND_NAVIGATION        0.9999998
## SHOPPING-MAPS_AND_NAVIGATION            0.7718214
## SOCIAL-MAPS_AND_NAVIGATION              1.0000000
## SPORTS-MAPS_AND_NAVIGATION              0.9999816
## TOOLS-MAPS_AND_NAVIGATION               1.0000000
## TRAVEL_AND_LOCAL-MAPS_AND_NAVIGATION    1.0000000
## VIDEO_PLAYERS-MAPS_AND_NAVIGATION       0.8050502
## WEATHER-MAPS_AND_NAVIGATION             0.6680247
## NEWS_AND_MAGAZINES-MEDICAL              0.0000000
## PARENTING-MEDICAL                       0.0000001
## PERSONALIZATION-MEDICAL                 0.0000000
## PHOTOGRAPHY-MEDICAL                     0.0000000
## PRODUCTIVITY-MEDICAL                    0.0000000
## SHOPPING-MEDICAL                        0.0000000
## SOCIAL-MEDICAL                          0.0000000
## SPORTS-MEDICAL                          0.0000000
## TOOLS-MEDICAL                           0.0000000
## TRAVEL_AND_LOCAL-MEDICAL                0.0000000
## VIDEO_PLAYERS-MEDICAL                   0.0000000
## WEATHER-MEDICAL                         0.0000000
## PARENTING-NEWS_AND_MAGAZINES            0.9999303
## PERSONALIZATION-NEWS_AND_MAGAZINES      0.9997235
## PHOTOGRAPHY-NEWS_AND_MAGAZINES          0.0000000
## PRODUCTIVITY-NEWS_AND_MAGAZINES         0.9999999
## SHOPPING-NEWS_AND_MAGAZINES             0.0000701
## SOCIAL-NEWS_AND_MAGAZINES               0.6848545
## SPORTS-NEWS_AND_MAGAZINES               1.0000000
## TOOLS-NEWS_AND_MAGAZINES                0.9999226
## TRAVEL_AND_LOCAL-NEWS_AND_MAGAZINES     0.8274253
## VIDEO_PLAYERS-NEWS_AND_MAGAZINES        0.0002508
## WEATHER-NEWS_AND_MAGAZINES              0.0018549
## PERSONALIZATION-PARENTING               0.8342765
## PHOTOGRAPHY-PARENTING                   0.2557528
## PRODUCTIVITY-PARENTING                  1.0000000
## SHOPPING-PARENTING                      0.9929068
## SOCIAL-PARENTING                        1.0000000
## SPORTS-PARENTING                        1.0000000
## TOOLS-PARENTING                         1.0000000
## TRAVEL_AND_LOCAL-PARENTING              1.0000000
## VIDEO_PLAYERS-PARENTING                 0.9929422
## WEATHER-PARENTING                       0.9482818
## PHOTOGRAPHY-PERSONALIZATION             0.0000000
## PRODUCTIVITY-PERSONALIZATION            0.4064198
## SHOPPING-PERSONALIZATION                0.0000000
## SOCIAL-PERSONALIZATION                  0.0030906
## SPORTS-PERSONALIZATION                  0.7932928
## TOOLS-PERSONALIZATION                   0.0458761
## TRAVEL_AND_LOCAL-PERSONALIZATION        0.0098178
## VIDEO_PLAYERS-PERSONALIZATION           0.0000000
## WEATHER-PERSONALIZATION                 0.0000026
## PRODUCTIVITY-PHOTOGRAPHY                0.0000000
## SHOPPING-PHOTOGRAPHY                    0.9919803
## SOCIAL-PHOTOGRAPHY                      0.0004241
## SPORTS-PHOTOGRAPHY                      0.0000000
## TOOLS-PHOTOGRAPHY                       0.0000000
## TRAVEL_AND_LOCAL-PHOTOGRAPHY            0.0003752
## VIDEO_PLAYERS-PHOTOGRAPHY               0.9986926
## WEATHER-PHOTOGRAPHY                     1.0000000
## SHOPPING-PRODUCTIVITY                   0.0015812
## SOCIAL-PRODUCTIVITY                     0.9968572
## SPORTS-PRODUCTIVITY                     1.0000000
## TOOLS-PRODUCTIVITY                      1.0000000
## TRAVEL_AND_LOCAL-PRODUCTIVITY           0.9996128
## VIDEO_PLAYERS-PRODUCTIVITY              0.0047029
## WEATHER-PRODUCTIVITY                    0.0189358
## SOCIAL-SHOPPING                         0.7051079
## SPORTS-SHOPPING                         0.0006137
## TOOLS-SHOPPING                          0.0003542
## TRAVEL_AND_LOCAL-SHOPPING               0.6432888
## VIDEO_PLAYERS-SHOPPING                  1.0000000
## WEATHER-SHOPPING                        1.0000000
## SPORTS-SOCIAL                           0.9725029
## TOOLS-SOCIAL                            0.9966816
## TRAVEL_AND_LOCAL-SOCIAL                 1.0000000
## VIDEO_PLAYERS-SOCIAL                    0.7673559
## WEATHER-SOCIAL                          0.6669524
## TOOLS-SPORTS                            1.0000000
## TRAVEL_AND_LOCAL-SPORTS                 0.9935210
## VIDEO_PLAYERS-SPORTS                    0.0019416
## WEATHER-SPORTS                          0.0094762
## TRAVEL_AND_LOCAL-TOOLS                  0.9997134
## VIDEO_PLAYERS-TOOLS                     0.0017256
## WEATHER-TOOLS                           0.0145571
## VIDEO_PLAYERS-TRAVEL_AND_LOCAL          0.7100174
## WEATHER-TRAVEL_AND_LOCAL                0.6116588
## WEATHER-VIDEO_PLAYERS                   1.0000000

Observations: - Installs for the Business category are significantly lower than those for Art and Design. - installs for the Entertainment category being higher than those for Art and Design. - Medical Category have lower Installs compared to Art and Design. - Installs in the Shopping category being higher than in Auto and Vehicles. - Installs for Game are significantly higher than those for Beauty. Many other comparisions do not have significance bwteen the categories as they have p value greater than 0.05.

Android version with mean(Installs)

# Ensure that Android.Ver is treated as a factor
data_final$Android.Ver <- as.factor(data_final$Android.Ver)

# Perform one-way ANOVA
anova_result <- aov(Installs ~ Android.Ver, data = data_final)

# Summary of the ANOVA result
anova_summary <- summary(anova_result)
print(anova_summary)
##               Df    Sum Sq   Mean Sq F value Pr(>F)    
## Android.Ver   33 1.143e+18 3.463e+16   12.45 <2e-16 ***
## Residuals   9625 2.677e+19 2.781e+15                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ANOVA results above indicate that there are significant differences among the group means for Android.Ver.

# As ANOVA is significant, performing a post-hoc test
if (anova_summary[[1]][["Pr(>F)"]][1] < 0.05) {
  posthoc_result <- TukeyHSD(anova_result)
  print(posthoc_result)
}
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Installs ~ Android.Ver, data = data_final)
## 
## $Android.Ver
##                                           diff        lwr       upr     p adj
## 1.5 and up-1.0 and up              1162085.500 -147882419 150206590 1.0000000
## 1.6 and up-1.0 and up               370117.690 -142958078 143698313 1.0000000
## 2.0 and up-1.0 and up              1079778.438 -145402165 147561722 1.0000000
## 2.0.1 and up-1.0 and up           14408580.000 -146727153 175544313 1.0000000
## 2.1 and up-1.0 and up              2342340.278 -140830516 145515196 1.0000000
## 2.2 - 7.1.1-1.0 and up              -29900.000 -246168798 246108998 1.0000000
## 2.2 and up-1.0 and up               375695.544 -142326020 143077412 1.0000000
## 2.3 and up-1.0 and up              2451520.145 -139887347 144790388 1.0000000
## 2.3.3 and up-1.0 and up            2551694.194 -140076258 145179647 1.0000000
## 3.0 and up-1.0 and up              1795152.126 -140927069 144517373 1.0000000
## 3.1 and up-1.0 and up              5073700.500 -150598207 160745608 1.0000000
## 3.2 and up-1.0 and up               200714.583 -145801745 146203174 1.0000000
## 4.0 and up-1.0 and up              4448261.639 -137770645 146667168 1.0000000
## 4.0.3 - 7.1.1-1.0 and up           7470000.000 -193501569 208441569 1.0000000
## 4.0.3 and up-1.0 and up            3638445.651 -138571747 145848638 1.0000000
## 4.1 - 7.1.1-1.0 and up            99970000.000 -146168898 346108898 0.9998945
## 4.1 and up-1.0 and up              5791588.460 -136381292 147964469 1.0000000
## 4.2 and up-1.0 and up              3635356.597 -138854502 146125215 1.0000000
## 4.3 and up-1.0 and up              2545998.738 -140203936 145295934 1.0000000
## 4.4 and up-1.0 and up              6012273.225 -136269706 148294253 1.0000000
## 4.4W and up-1.0 and up              -27520.909 -154515595 154460553 1.0000000
## 5.0 - 6.0-1.0 and up                -20000.000 -246158898 246118898 1.0000000
## 5.0 - 7.1.1-1.0 and up              -29995.000 -246168893 246108903 1.0000000
## 5.0 - 8.0-1.0 and up               9970000.000 -191001569 210941569 1.0000000
## 5.0 and up-1.0 and up              3137590.514 -139248053 145523234 1.0000000
## 5.1 and up-1.0 and up               261898.182 -148165442 148689238 1.0000000
## 6.0 and up-1.0 and up               944933.167 -143771136 145661003 1.0000000
## 7.0 - 7.1.1-1.0 and up              970000.000 -245168898 247108898 1.0000000
## 7.0 and up-1.0 and up              5291657.143 -140160886 150744200 1.0000000
## 7.1 and up-1.0 and up             33653333.333 -149807769 217114436 1.0000000
## 8.0 and up-1.0 and up               221684.333 -163870914 164314283 1.0000000
## NaN-1.0 and up                      -24500.000 -200996069 200947069 1.0000000
## Varies with device-1.0 and up     39444362.406 -102807468 181696193 1.0000000
## 1.6 and up-1.5 and up              -791967.810  -49450621  47866685 1.0000000
## 2.0 and up-1.5 and up               -82307.063  -57368018  57203404 1.0000000
## 2.0.1 and up-1.5 and up           13246494.500  -75011181 101504170 1.0000000
## 2.1 and up-1.5 and up              1180254.778  -47018911  49379420 1.0000000
## 2.2 - 7.1.1-1.5 and up            -1191985.500 -207126563 204742592 1.0000000
## 2.2 and up-1.5 and up              -786389.956  -47567504  45994724 1.0000000
## 2.3 and up-1.5 and up              1289434.644  -44372870  46951739 1.0000000
## 2.3.3 and up-1.5 and up            1389608.694  -45166011  47945228 1.0000000
## 3.0 and up-1.5 and up               633066.626  -46210558  47476692 1.0000000
## 3.1 and up-1.5 and up              3911615.000  -73924339  81747569 1.0000000
## 3.2 and up-1.5 and up              -961370.917  -57009664  55086922 1.0000000
## 4.0 and up-1.5 and up              3286176.139  -42000799  48573152 1.0000000
## 4.0.3 - 7.1.1-1.5 and up           6307914.500 -142736590 155352419 1.0000000
## 4.0.3 and up-1.5 and up            2476360.151  -42783243  47735963 1.0000000
## 4.1 - 7.1.1-1.5 and up            98807914.500 -107126663 304742492 0.9973799
## 4.1 and up-1.5 and up              4629502.960  -40512726  49771731 1.0000000
## 4.2 and up-1.5 and up              2473271.097  -43657550  48604093 1.0000000
## 4.3 and up-1.5 and up              1383913.238  -45544083  48311909 1.0000000
## 4.4 and up-1.5 and up              4850187.725  -40634475  50334851 1.0000000
## 4.4W and up-1.5 and up            -1189606.409  -76630032  74250819 1.0000000
## 5.0 - 6.0-1.5 and up              -1182085.500 -207116663 204752492 1.0000000
## 5.0 - 7.1.1-1.5 and up            -1192080.500 -207126658 204742497 1.0000000
## 5.0 - 8.0-1.5 and up               8807914.500 -140236590 157852419 1.0000000
## 5.0 and up-1.5 and up              1975505.014  -43832403  47783413 1.0000000
## 5.1 and up-1.5 and up              -900187.318  -62991799  61191424 1.0000000
## 6.0 and up-1.5 and up              -217152.333  -52823555  52389250 1.0000000
## 7.0 - 7.1.1-1.5 and up             -192085.500 -206126663 205742492 1.0000000
## 7.0 and up-1.5 and up              4129571.643  -50470200  58729343 1.0000000
## 7.1 and up-1.5 and up             32491247.833  -91938126 156920622 1.0000000
## 8.0 and up-1.5 and up              -940401.167  -94487575  92606773 1.0000000
## NaN-1.5 and up                    -1186585.500 -150231090 147857919 1.0000000
## Varies with device-1.5 and up     38282276.906   -7107988  83672541 0.2825077
## 2.0 and up-1.6 and up               709660.748  -39419631  40838953 1.0000000
## 2.0.1 and up-1.6 and up           14038462.310  -64179980  92256905 1.0000000
## 2.1 and up-1.6 and up              1972222.589  -23559463  27503908 1.0000000
## 2.2 - 7.1.1-1.6 and up             -400017.690 -202235984 201435949 1.0000000
## 2.2 and up-1.6 and up                 5577.854  -22736014  22747169 1.0000000
## 2.3 and up-1.6 and up              2081402.455  -18259528  22422333 1.0000000
## 2.3.3 and up-1.6 and up            2181576.504  -20092469  24455622 1.0000000
## 3.0 and up-1.6 and up              1425034.436  -21444871  24294940 1.0000000
## 3.1 and up-1.6 and up              4703582.810  -61531930  70939095 1.0000000
## 3.2 and up-1.6 and up              -169403.106  -38511527  38172721 1.0000000
## 4.0 and up-1.6 and up              4078143.949  -15405632  23561920 1.0000000
## 4.0.3 - 7.1.1-1.6 and up           7099882.310 -136228313 150428078 1.0000000
## 4.0.3 and up-1.6 and up            3268327.961  -16151740  22688396 1.0000000
## 4.1 - 7.1.1-1.6 and up            99599882.310 -102236084 301435849 0.9957732
## 4.1 and up-1.6 and up              5421470.770  -13723455  24566397 1.0000000
## 4.2 and up-1.6 and up              3265238.907  -18106707  24637185 1.0000000
## 4.3 and up-1.6 and up              2175881.048  -20866345  25218108 1.0000000
## 4.4 and up-1.6 and up              5642155.535  -14296800  25581111 1.0000000
## 4.4W and up-1.6 and up             -397638.599  -63800834  63005557 1.0000000
## 5.0 - 6.0-1.6 and up               -390117.690 -202226084 201445849 1.0000000
## 5.0 - 7.1.1-1.6 and up             -400112.690 -202236079 201435854 1.0000000
## 5.0 - 8.0-1.6 and up               9599882.310 -133728313 152928078 1.0000000
## 5.0 and up-1.6 and up              2767472.824  -17898244  23433189 1.0000000
## 5.1 and up-1.6 and up              -108219.508  -46842314  46625875 1.0000000
## 6.0 and up-1.6 and up               574815.477  -32533205  33682835 1.0000000
## 7.0 - 7.1.1-1.6 and up              599882.310 -201236084 202435849 1.0000000
## 7.0 and up-1.6 and up              4921539.453  -31270206  41113285 1.0000000
## 7.1 and up-1.6 and up             33283215.644  -84238597 150805028 1.0000000
## 8.0 and up-1.6 and up              -148433.356  -84289869  83993002 1.0000000
## NaN-1.6 and up                     -394617.690 -143722813 142933578 1.0000000
## Varies with device-1.6 and up     39074244.716   19351580  58796910 0.0000000
## 2.0.1 and up-2.0 and up           13328801.562  -70528893  97186496 1.0000000
## 2.1 and up-2.0 and up              1262561.841  -38308325  40833449 1.0000000
## 2.2 - 7.1.1-2.0 and up            -1109678.437 -205197271 202977914 1.0000000
## 2.2 and up-2.0 and up              -704082.894  -38534866  37126700 1.0000000
## 2.3 and up-2.0 and up              1371741.707  -35066447  37809930 1.0000000
## 2.3.3 and up-2.0 and up            1471915.757  -36079666  39023497 1.0000000
## 3.0 and up-2.0 and up               715373.688  -37192683  38623430 1.0000000
## 3.1 and up-2.0 and up              3993922.062  -68814946  76802790 1.0000000
## 3.2 and up-2.0 and up              -879063.854  -49706378  47948251 1.0000000
## 4.0 and up-2.0 and up              3368483.201  -32598247  39335213 1.0000000
## 4.0.3 - 7.1.1-2.0 and up           6390221.562 -140091722 152872165 1.0000000
## 4.0.3 and up-2.0 and up            2558667.213  -33373591  38490926 1.0000000
## 4.1 - 7.1.1-2.0 and up            98890221.562 -105197371 302977814 0.9968951
## 4.1 and up-2.0 and up              4711810.023  -31072493  40496113 1.0000000
## 4.2 and up-2.0 and up              2555578.159  -34468039  39579195 1.0000000
## 4.3 and up-2.0 and up              1466220.300  -36546045  39478486 1.0000000
## 4.4 and up-2.0 and up              4932494.787  -31282835  41147825 1.0000000
## 4.4W and up-2.0 and up            -1107299.347  -71349404  69134806 1.0000000
## 5.0 - 6.0-2.0 and up              -1099778.438 -205187371 202987814 1.0000000
## 5.0 - 7.1.1-2.0 and up            -1109773.438 -205197366 202977819 1.0000000
## 5.0 - 8.0-2.0 and up               8890221.562 -137591722 155372165 1.0000000
## 5.0 and up-2.0 and up              2057812.076  -34562674  38678298 1.0000000
## 5.1 and up-2.0 and up              -817880.256  -56478133  54842372 1.0000000
## 6.0 and up-2.0 and up              -134845.271  -44969309  44699618 1.0000000
## 7.0 - 7.1.1-2.0 and up             -109778.438 -204197371 203977814 1.0000000
## 7.0 and up-2.0 and up              4211878.705  -42945629  51369387 1.0000000
## 7.1 and up-2.0 and up             32573554.896  -88774558 153921668 1.0000000
## 8.0 and up-2.0 and up              -858094.104  -90265976  88549788 1.0000000
## NaN-2.0 and up                    -1104278.437 -147586222 145377665 1.0000000
## Varies with device-2.0 and up     38364583.969    2267885  74461283 0.0209578
## 2.1 and up-2.0.1 and up          -12066239.722  -89999671  65867192 1.0000000
## 2.2 - 7.1.1-2.0.1 and up         -14438480.000 -229286124 200409164 1.0000000
## 2.2 and up-2.0.1 and up          -14032884.456  -91097356  63031587 1.0000000
## 2.3 and up-2.0.1 and up          -11957059.856  -88347545  64433426 1.0000000
## 2.3.3 and up-2.0.1 and up        -11856885.806  -88784683  65070911 1.0000000
## 3.0 and up-2.0.1 and up          -12613427.874  -89715862  64489007 1.0000000
## 3.1 and up-2.0.1 and up           -9334879.500 -108374750  89704991 1.0000000
## 3.2 and up-2.0.1 and up          -14207865.417  -97225161  68809430 1.0000000
## 4.0 and up-2.0.1 and up           -9960318.361  -86127046  66206409 1.0000000
## 4.0.3 - 7.1.1-2.0.1 and up        -6938580.000 -168074313 154197153 1.0000000
## 4.0.3 and up-2.0.1 and up        -10770134.349  -86920590  65380321 1.0000000
## 4.1 - 7.1.1-2.0.1 and up          85561420.000 -129286224 300409064 0.9999305
## 4.1 and up-2.0.1 and up           -8616991.540  -84697745  67463762 1.0000000
## 4.2 and up-2.0.1 and up          -10773223.403  -87444684  65898237 1.0000000
## 4.3 and up-2.0.1 and up          -11862581.262  -89016305  65291142 1.0000000
## 4.4 and up-2.0.1 and up           -8396306.775  -84680740  67888127 1.0000000
## 4.4W and up-2.0.1 and up         -14436100.909 -111604604  82732402 1.0000000
## 5.0 - 6.0-2.0.1 and up           -14428580.000 -229276224 200419064 1.0000000
## 5.0 - 7.1.1-2.0.1 and up         -14438575.000 -229286219 200409069 1.0000000
## 5.0 - 8.0-2.0.1 and up            -4438580.000 -165574313 156697153 1.0000000
## 5.0 and up-2.0.1 and up          -11270989.486  -87748598  65206619 1.0000000
## 5.1 and up-2.0.1 and up          -14146681.818 -101358082  73064719 1.0000000
## 6.0 and up-2.0.1 and up          -13463646.833  -94197121  67269827 1.0000000
## 7.0 - 7.1.1-2.0.1 and up         -13438580.000 -228286224 201409064 1.0000000
## 7.0 and up-2.0.1 and up           -9116922.857  -91163222  72929377 1.0000000
## 7.1 and up-2.0.1 and up           19244753.333 -119438805 157928311 1.0000000
## 8.0 and up-2.0.1 and up          -14186895.667 -125997155  97623363 1.0000000
## NaN-2.0.1 and up                 -14433080.000 -175568813 146702653 1.0000000
## Varies with device-2.0.1 and up   25035782.406  -51192403 101263968 0.9999992
## 2.2 - 7.1.1-2.1 and up            -2372240.278 -204097926 199353446 1.0000000
## 2.2 and up-2.1 and up             -1966644.734  -23707735  19774446 1.0000000
## 2.3 and up-2.1 and up               109179.866  -19106659  19325019 1.0000000
## 2.3.3 and up-2.1 and up             209353.916  -21042190  21460898 1.0000000
## 3.0 and up-2.1 and up              -547188.153  -22422462  21328086 1.0000000
## 3.1 and up-2.1 and up              2731360.222  -63167336  68630056 1.0000000
## 3.2 and up-2.1 and up             -2141625.695  -39898923  35615672 1.0000000
## 4.0 and up-2.1 and up              2105921.361  -16200159  20412001 1.0000000
## 4.0.3 - 7.1.1-2.1 and up           5127659.722 -138045196 148300516 1.0000000
## 4.0.3 and up-2.1 and up            1296105.373  -16942153  19534364 1.0000000
## 4.1 - 7.1.1-2.1 and up            97627659.722 -104098026 299353346 0.9969588
## 4.1 and up-2.1 and up              3449248.182  -14495757  21394253 1.0000000
## 4.2 and up-2.1 and up              1293016.319  -19011050  21597083 1.0000000
## 4.3 and up-2.1 and up               203658.459  -21851709  22259026 1.0000000
## 4.4 and up-2.1 and up              3669932.947  -15119878  22459744 1.0000000
## 4.4W and up-2.1 and up            -2369861.187  -65421112  60681389 1.0000000
## 5.0 - 6.0-2.1 and up              -2362340.278 -204088026 199363346 1.0000000
## 5.0 - 7.1.1-2.1 and up            -2372335.278 -204098021 199353351 1.0000000
## 5.0 - 8.0-2.1 and up               7627659.722 -135545196 150800516 1.0000000
## 5.0 and up-2.1 and up               795250.235  -18764066  20354567 1.0000000
## 5.1 and up-2.1 and up             -2080442.096  -48335935  44175051 1.0000000
## 6.0 and up-2.1 and up             -1397407.112  -33826345  31031531 1.0000000
## 7.0 - 7.1.1-2.1 and up            -1372340.278 -203098026 200353346 1.0000000
## 7.0 and up-2.1 and up              2949316.865  -32622265  38520899 1.0000000
## 7.1 and up-2.1 and up             31310993.055  -86021319 148643305 1.0000000
## 8.0 and up-2.1 and up             -2120655.945  -85997209  81755898 1.0000000
## NaN-2.1 and up                    -2366840.278 -145539696 140806016 1.0000000
## Varies with device-2.1 and up     37102022.128   18541889  55662155 0.0000000
## 2.2 and up-2.2 - 7.1.1              405595.544 -200985977 201797168 1.0000000
## 2.3 and up-2.2 - 7.1.1             2481420.144 -198653209 203616049 1.0000000
## 2.3.3 and up-2.2 - 7.1.1           2581594.194 -198757718 203920906 1.0000000
## 3.0 and up-2.2 - 7.1.1             1825052.126 -199581050 203231154 1.0000000
## 3.1 and up-2.2 - 7.1.1             5103600.500 -205677159 215884360 1.0000000
## 3.2 and up-2.2 - 7.1.1              230614.583 -203513107 203974336 1.0000000
## 4.0 and up-2.2 - 7.1.1             4478161.639 -196571591 205527914 1.0000000
## 4.0.3 - 7.1.1-2.2 - 7.1.1          7499900.000 -238638998 253638798 1.0000000
## 4.0.3 and up-2.2 - 7.1.1           3668345.651 -197375243 204711934 1.0000000
## 4.1 - 7.1.1-2.2 - 7.1.1           99999900.000 -184216818 384216618 0.9999958
## 4.1 and up-2.2 - 7.1.1             5821488.460 -195195709 206838686 1.0000000
## 4.2 and up-2.2 - 7.1.1             3665256.597 -197576254 204906767 1.0000000
## 4.3 and up-2.2 - 7.1.1             2575898.738 -198849844 204001641 1.0000000
## 4.4 and up-2.2 - 7.1.1             6042173.225 -195052201 207136548 1.0000000
## 4.4W and up-2.2 - 7.1.1               2379.091 -209905579 209910337 1.0000000
## 5.0 - 6.0-2.2 - 7.1.1                 9900.000 -284206818 284226618 1.0000000
## 5.0 - 7.1.1-2.2 - 7.1.1                -95.000 -284216813 284216623 1.0000000
## 5.0 - 8.0-2.2 - 7.1.1              9999900.000 -236138998 256138798 1.0000000
## 5.0 and up-2.2 - 7.1.1             3167490.514 -198000244 204335225 1.0000000
## 5.1 and up-2.2 - 7.1.1              291798.182 -205196550 205780146 1.0000000
## 6.0 and up-2.2 - 7.1.1              974833.167 -201849047 203798714 1.0000000
## 7.0 - 7.1.1-2.2 - 7.1.1             999900.000 -283216818 285216618 1.0000000
## 7.0 and up-2.2 - 7.1.1             5321557.143 -198028456 208671570 1.0000000
## 7.1 and up-2.2 - 7.1.1            33683233.333 -198378745 265745212 1.0000000
## 8.0 and up-2.2 - 7.1.1              251584.333 -216822520 217325688 1.0000000
## NaN-2.2 - 7.1.1                       5400.000 -246133498 246144298 1.0000000
## Varies with device-2.2 - 7.1.1    39474262.406 -161598782 240547306 1.0000000
## 2.3 and up-2.2 and up              2075824.601  -13239576  17391226 1.0000000
## 2.3.3 and up-2.2 and up            2175998.650  -15626843  19978840 1.0000000
## 3.0 and up-2.2 and up              1419456.582  -17123484  19962397 1.0000000
## 3.1 and up-2.2 and up              4698004.956  -60170718  69566728 1.0000000
## 3.2 and up-2.2 and up              -174980.961  -36104447  35754486 1.0000000
## 4.0 and up-2.2 and up              4072566.095  -10084609  18229741 1.0000000
## 4.0.3 - 7.1.1-2.2 and up           7094304.456 -135607412 149796020 1.0000000
## 4.0.3 and up-2.2 and up            3262750.107  -10806618  17332118 1.0000000
## 4.1 - 7.1.1-2.2 and up            99594304.456 -101797268 300985877 0.9956192
## 4.1 and up-2.2 and up              5415892.916   -8271190  19102976 0.9999395
## 4.2 and up-2.2 and up              3259661.053  -13400715  19920037 1.0000000
## 4.3 and up-2.2 and up              2170303.194  -16584757  20925363 1.0000000
## 4.4 and up-2.2 and up              5636577.681   -9140771  20413926 0.9999729
## 4.4W and up-2.2 and up             -403216.453  -62377189  61570756 1.0000000
## 5.0 - 6.0-2.2 and up               -395695.544 -201787268 200995877 1.0000000
## 5.0 - 7.1.1-2.2 and up             -405690.544 -201797263 200985882 1.0000000
## 5.0 - 8.0-2.2 and up               9594304.456 -133107412 152296020 1.0000000
## 5.0 and up-2.2 and up              2761894.970  -12982307  18506097 1.0000000
## 5.1 and up-2.2 and up              -113797.362  -44889724  44662130 1.0000000
## 6.0 and up-2.2 and up               569237.623  -29711929  30850404 1.0000000
## 7.0 - 7.1.1-2.2 and up              594304.456 -200797268 201985877 1.0000000
## 7.0 and up-2.2 and up              4915961.599  -28709185  38541108 1.0000000
## 7.1 and up-2.2 and up             33277637.789  -83479308 150034584 1.0000000
## 8.0 and up-2.2 and up              -154011.211  -83223797  82915774 1.0000000
## NaN-2.2 and up                     -400195.544 -143101912 142301520 1.0000000
## Varies with device-2.2 and up     39068666.862   24584483  53552850 0.0000000
## 2.3.3 and up-2.3 and up             100174.050  -14511966  14712314 1.0000000
## 3.0 and up-2.3 and up              -656368.019  -16161661  14848925 1.0000000
## 3.1 and up-2.3 and up              2622180.356  -61444384  66688744 1.0000000
## 3.2 and up-2.3 and up             -2250805.561  -36710928  32209317 1.0000000
## 4.0 and up-2.3 and up              1996741.494   -7852069  11845552 1.0000000
## 4.0.3 - 7.1.1-2.3 and up           5018479.856 -137320388 147357347 1.0000000
## 4.0.3 and up-2.3 and up            1186925.506   -8535244  10909095 1.0000000
## 4.1 - 7.1.1-2.3 and up            97518479.856 -103616149 298653109 0.9968627
## 4.1 and up-2.3 and up              3340068.316   -5820156  12500293 0.9999903
## 4.2 and up-2.3 and up              1183836.452  -12012436  14380109 1.0000000
## 4.3 and up-2.3 and up                94478.593  -15663876  15852833 1.0000000
## 4.4 and up-2.3 and up              3560753.080   -7160417  14281923 0.9999989
## 4.4W and up-2.3 and up            -2479041.054  -63612884  58654802 1.0000000
## 5.0 - 6.0-2.3 and up              -2471520.144 -203606149 198663109 1.0000000
## 5.0 - 7.1.1-2.3 and up            -2481515.144 -203616144 198653114 1.0000000
## 5.0 - 8.0-2.3 and up               7518479.856 -134820388 149857347 1.0000000
## 5.0 and up-2.3 and up               686070.369  -11332805  12704946 1.0000000
## 5.1 and up-2.3 and up             -2189621.963  -45795322  41416078 1.0000000
## 6.0 and up-2.3 and up             -1506586.978  -30028903  27015729 1.0000000
## 7.0 - 7.1.1-2.3 and up            -1481520.144 -202616149 199653109 1.0000000
## 7.0 and up-2.3 and up              2840136.998  -29210198  34890472 1.0000000
## 7.1 and up-2.3 and up             31201813.189  -85111376 147515002 1.0000000
## 8.0 and up-2.3 and up             -2229835.811  -84674743  80215071 1.0000000
## NaN-2.3 and up                    -2476020.144 -144814888 139862847 1.0000000
## Varies with device-2.3 and up     36992842.262   26679500  47306185 0.0000000
## 3.0 and up-2.3.3 and up            -756542.069  -18723005  17209921 1.0000000
## 3.1 and up-2.3.3 and up            2522006.306  -62184286  67228299 1.0000000
## 3.2 and up-2.3.3 and up           -2350979.611  -37986351  33284392 1.0000000
## 4.0 and up-2.3.3 and up            1896567.445  -11496667  15289802 1.0000000
## 4.0.3 - 7.1.1-2.3.3 and up         4918305.806 -137709647 147546258 1.0000000
## 4.0.3 and up-2.3.3 and up          1086751.457  -12213634  14387137 1.0000000
## 4.1 - 7.1.1-2.3.3 and up          97418305.806 -103921006 298757618 0.9969708
## 4.1 and up-2.3.3 and up            3239894.266   -9655429  16135218 1.0000000
## 4.2 and up-2.3.3 and up            1083662.403  -14932619  17099944 1.0000000
## 4.3 and up-2.3.3 and up              -5695.457  -18191002  18179612 1.0000000
## 4.4 and up-2.3.3 and up            3460579.031  -10586597  17507755 1.0000000
## 4.4W and up-2.3.3 and up          -2579215.103  -64383150  59224720 1.0000000
## 5.0 - 6.0-2.3.3 and up            -2571694.194 -203911006 198767618 1.0000000
## 5.0 - 7.1.1-2.3.3 and up          -2581689.194 -203921001 198757623 1.0000000
## 5.0 - 8.0-2.3.3 and up             7418305.806 -135209647 150046258 1.0000000
## 5.0 and up-2.3.3 and up             585896.320  -14475081  15646873 1.0000000
## 5.1 and up-2.3.3 and up           -2289796.012  -46830078  42250486 1.0000000
## 6.0 and up-2.3.3 and up           -1606761.027  -31538386  28324864 1.0000000
## 7.0 - 7.1.1-2.3.3 and up          -1581694.194 -202921006 199757618 1.0000000
## 7.0 and up-2.3.3 and up            2739962.949  -30570750  36050676 1.0000000
## 7.1 and up-2.3.3 and up           31101639.139  -85565141 147768419 1.0000000
## 8.0 and up-2.3.3 and up           -2330009.861  -85273017  80612997 1.0000000
## NaN-2.3.3 and up                  -2576194.194 -145204147 140051758 1.0000000
## Varies with device-2.3.3 and up   36892668.212   23154230  50631107 0.0000000
## 3.1 and up-3.0 and up              3278548.374  -61635270  68192366 1.0000000
## 3.2 and up-3.0 and up             -1594437.542  -37605258  34416383 1.0000000
## 4.0 and up-3.0 and up              2653109.513  -11709279  17015498 1.0000000
## 4.0.3 - 7.1.1-3.0 and up           5674847.874 -137047373 148397069 1.0000000
## 4.0.3 and up-3.0 and up            1843293.525  -12432551  16119138 1.0000000
## 4.1 - 7.1.1-3.0 and up            98174847.874 -103231254 299580950 0.9965625
## 4.1 and up-3.0 and up              3996436.334   -9902803  17895676 1.0000000
## 4.2 and up-3.0 and up              1840204.471  -14994900  18675309 1.0000000
## 4.3 and up-3.0 and up               750846.612  -18159597  19661290 1.0000000
## 4.4 and up-3.0 and up              4217121.099  -10756944  19191186 1.0000000
## 4.4W and up-3.0 and up            -1822673.035  -63843846  60198500 1.0000000
## 5.0 - 6.0-3.0 and up              -1815152.126 -203221254 199590950 1.0000000
## 5.0 - 7.1.1-3.0 and up            -1825147.126 -203231249 199580955 1.0000000
## 5.0 - 8.0-3.0 and up               8174847.874 -134547373 150897069 1.0000000
## 5.0 and up-3.0 and up              1342438.388  -14586545  17271422 1.0000000
## 5.1 and up-3.0 and up             -1533253.944  -46374488  43307980 1.0000000
## 6.0 and up-3.0 and up              -850218.959  -31227870  29527432 1.0000000
## 7.0 - 7.1.1-3.0 and up             -825152.126 -202231254 200580950 1.0000000
## 7.0 and up-3.0 and up              3496505.017  -30215557  37208567 1.0000000
## 7.1 and up-3.0 and up             31858181.208  -84923826 148640188 1.0000000
## 8.0 and up-3.0 and up             -1573467.792  -84678473  81531537 1.0000000
## NaN-3.0 and up                    -1819652.126 -144541873 140902569 1.0000000
## Varies with device-3.0 and up     37649210.281   22964382  52334038 0.0000000
## 3.2 and up-3.1 and up             -4872985.917  -76712318  66966347 1.0000000
## 4.0 and up-3.1 and up              -625438.861  -64425037  63174159 1.0000000
## 4.0.3 - 7.1.1-3.1 and up           2396299.500 -153275608 158068207 1.0000000
## 4.0.3 and up-3.1 and up           -1435254.849  -65215426  62344916 1.0000000
## 4.1 - 7.1.1-3.1 and up            94896299.500 -115884460 305677059 0.9991734
## 4.1 and up-3.1 and up               717887.960  -62979046  64414822 1.0000000
## 4.2 and up-3.1 and up             -1438343.903  -65839674  62962986 1.0000000
## 4.3 and up-3.1 and up             -2527701.762  -67502431  62447027 1.0000000
## 4.4 and up-3.1 and up               938572.725  -63001502  64878647 1.0000000
## 4.4W and up-3.1 and up            -5101221.409  -92912020  82709577 1.0000000
## 5.0 - 6.0-3.1 and up              -5093700.500 -215874460 205687059 1.0000000
## 5.0 - 7.1.1-3.1 and up            -5103695.500 -215884455 205677064 1.0000000
## 5.0 - 8.0-3.1 and up               4896299.500 -150775608 160568207 1.0000000
## 5.0 and up-3.1 and up             -1936109.986  -66106532  62234312 1.0000000
## 5.1 and up-3.1 and up             -4811802.318  -81459351  71835747 1.0000000
## 6.0 and up-3.1 and up             -4128767.333  -73316282  65058747 1.0000000
## 7.0 - 7.1.1-3.1 and up            -4103700.500 -214884460 206677059 1.0000000
## 7.0 and up-3.1 and up               217956.643  -70497060  70932973 1.0000000
## 7.1 and up-3.1 and up             28579632.833 -103716050 160875315 1.0000000
## 8.0 and up-3.1 and up             -4852016.167 -108633288  98929256 1.0000000
## NaN-3.1 and up                    -5098200.500 -160770108 150573707 1.0000000
## Varies with device-3.1 and up     34370661.906  -29502296  98243619 0.9837049
## 4.0 and up-3.2 and up              4247547.056  -29713668  38208762 1.0000000
## 4.0.3 - 7.1.1-3.2 and up           7269285.417 -138733174 153271745 1.0000000
## 4.0.3 and up-3.2 and up            3437731.068  -30486974  37362436 1.0000000
## 4.1 - 7.1.1-3.2 and up            99769285.417 -103974436 303513007 0.9962860
## 4.1 and up-3.2 and up              5590873.877  -28177081  39358829 1.0000000
## 4.2 and up-3.2 and up              3434642.013  -31643936  38513220 1.0000000
## 4.3 and up-3.2 and up              2345284.154  -33775219  38465788 1.0000000
## 4.4 and up-3.2 and up              5811558.642  -28412827  40035944 1.0000000
## 4.4W and up-3.2 and up             -228235.492  -69464871  69008400 1.0000000
## 5.0 - 6.0-3.2 and up               -220714.583 -203964436 203523007 1.0000000
## 5.0 - 7.1.1-3.2 and up             -230709.583 -203974431 203513012 1.0000000
## 5.0 - 8.0-3.2 and up               9769285.417 -136233174 155771745 1.0000000
## 5.0 and up-3.2 and up              2936875.930  -31715951  37589703 1.0000000
## 5.1 and up-3.2 and up                61183.598  -54324680  54447047 1.0000000
## 6.0 and up-3.2 and up               744218.583  -42497978  43986415 1.0000000
## 7.0 - 7.1.1-3.2 and up              769285.417 -202974436 204513007 1.0000000
## 7.0 and up-3.2 and up              5090942.560  -40555405  50737290 1.0000000
## 7.1 and up-3.2 and up             33452618.750  -87316264 154221501 1.0000000
## 8.0 and up-3.2 and up                20969.750  -88599162  88641102 1.0000000
## NaN-3.2 and up                     -225214.583 -146227674 145777245 1.0000000
## Varies with device-3.2 and up     39243647.823    5144820  73342476 0.0053696
## 4.0.3 - 7.1.1-4.0 and up           3021738.361 -139197168 145240645 1.0000000
## 4.0.3 and up-4.0 and up            -809815.988   -8580573   6960941 1.0000000
## 4.1 - 7.1.1-4.0 and up            95521738.361 -105528014 296571491 0.9977933
## 4.1 and up-4.0 and up              1343326.821   -5711728   8398381 1.0000000
## 4.2 and up-4.0 and up              -812905.042  -12645305  11019495 1.0000000
## 4.3 and up-4.0 and up             -1902262.901  -16537488  12732962 1.0000000
## 4.4 and up-4.0 and up              1564011.586   -7425292  10553315 1.0000000
## 4.4W and up-4.0 and up            -4475782.548  -65329795  56378230 1.0000000
## 5.0 - 6.0-4.0 and up              -4468261.639 -205518014 196581491 1.0000000
## 5.0 - 7.1.1-4.0 and up            -4478256.639 -205528009 196571496 1.0000000
## 5.0 - 8.0-4.0 and up               5521738.361 -136697168 147740645 1.0000000
## 5.0 and up-4.0 and up             -1310671.125  -11813883   9192541 1.0000000
## 5.1 and up-4.0 and up             -4186363.457  -47398875  39026148 1.0000000
## 6.0 and up-4.0 and up             -3503328.472  -31420825  24414168 1.0000000
## 7.0 - 7.1.1-4.0 and up            -3478261.639 -204528014 197571491 1.0000000
## 7.0 and up-4.0 and up               843395.504  -30669904  32356695 1.0000000
## 7.1 and up-4.0 and up             29205071.694  -86961283 145371427 1.0000000
## 8.0 and up-4.0 and up             -4226577.306  -86464201  78011047 1.0000000
## NaN-4.0 and up                    -4472761.639 -146691668 137746145 1.0000000
## Varies with device-4.0 and up     34996100.767   26497329  43494872 0.0000000
## 4.0.3 and up-4.0.3 - 7.1.1        -3831554.349 -146041747 138378638 1.0000000
## 4.1 - 7.1.1-4.0.3 - 7.1.1         92500000.000 -153638898 338638898 0.9999806
## 4.1 and up-4.0.3 - 7.1.1          -1678411.540 -143851292 140494469 1.0000000
## 4.2 and up-4.0.3 - 7.1.1          -3834643.403 -146324502 138655215 1.0000000
## 4.3 and up-4.0.3 - 7.1.1          -4924001.262 -147673936 137825934 1.0000000
## 4.4 and up-4.0.3 - 7.1.1          -1457726.775 -143739706 140824253 1.0000000
## 4.4W and up-4.0.3 - 7.1.1         -7497520.909 -161985595 146990553 1.0000000
## 5.0 - 6.0-4.0.3 - 7.1.1           -7490000.000 -253628898 238648898 1.0000000
## 5.0 - 7.1.1-4.0.3 - 7.1.1         -7499995.000 -253638893 238638903 1.0000000
## 5.0 - 8.0-4.0.3 - 7.1.1            2500000.000 -198471569 203471569 1.0000000
## 5.0 and up-4.0.3 - 7.1.1          -4332409.486 -146718053 138053234 1.0000000
## 5.1 and up-4.0.3 - 7.1.1          -7208101.818 -155635442 141219238 1.0000000
## 6.0 and up-4.0.3 - 7.1.1          -6525066.833 -151241136 138191003 1.0000000
## 7.0 - 7.1.1-4.0.3 - 7.1.1         -6500000.000 -252638898 239638898 1.0000000
## 7.0 and up-4.0.3 - 7.1.1          -2178342.857 -147630886 143274200 1.0000000
## 7.1 and up-4.0.3 - 7.1.1          26183333.333 -157277769 209644436 1.0000000
## 8.0 and up-4.0.3 - 7.1.1          -7248315.667 -171340914 156844283 1.0000000
## NaN-4.0.3 - 7.1.1                 -7494500.000 -208466069 193477069 1.0000000
## Varies with device-4.0.3 - 7.1.1  31974362.406 -110277468 174226193 1.0000000
## 4.1 - 7.1.1-4.0.3 and up          96331554.349 -104712034 297375143 0.9974405
## 4.1 and up-4.0.3 and up            2153142.809   -4724016   9030302 0.9999998
## 4.2 and up-4.0.3 and up              -3089.054  -11730289  11724111 1.0000000
## 4.3 and up-4.0.3 and up           -1092446.913  -15642751  13457857 1.0000000
## 4.4 and up-4.0.3 and up            2373827.574   -6476545  11224201 1.0000000
## 4.4W and up-4.0.3 and up          -3665966.560  -64499611  57167678 1.0000000
## 5.0 - 6.0-4.0.3 and up            -3658445.651 -204702034 197385143 1.0000000
## 5.0 - 7.1.1-4.0.3 and up          -3668440.651 -204712029 197375148 1.0000000
## 5.0 - 8.0-4.0.3 and up             6331554.349 -135878638 148541747 1.0000000
## 5.0 and up-4.0.3 and up            -500855.137  -10885410   9883700 1.0000000
## 5.1 and up-4.0.3 and up           -3376547.469  -46560372  39807277 1.0000000
## 6.0 and up-4.0.3 and up           -2693512.484  -30566584  25179559 1.0000000
## 7.0 - 7.1.1-4.0.3 and up          -2668445.651 -203712034 198375143 1.0000000
## 7.0 and up-4.0.3 and up            1653211.492  -29820739  33127162 1.0000000
## 7.1 and up-4.0.3 and up           30014887.682  -86140799 146170574 1.0000000
## 8.0 and up-4.0.3 and up           -3416761.318  -85639315  78805792 1.0000000
## NaN-4.0.3 and up                  -3662945.651 -145873138 138547247 1.0000000
## Varies with device-4.0.3 and up   35805916.755   27454232  44157602 0.0000000
## 4.1 and up-4.1 - 7.1.1           -94178411.540 -295195609 106838786 0.9982794
## 4.2 and up-4.1 - 7.1.1           -96334643.403 -297576154 104906867 0.9974826
## 4.3 and up-4.1 - 7.1.1           -97424001.262 -298849744 104001741 0.9969900
## 4.4 and up-4.1 - 7.1.1           -93957726.775 -295052101 107136648 0.9983618
## 4.4W and up-4.1 - 7.1.1          -99997520.909 -309905479 109910437 0.9976871
## 5.0 - 6.0-4.1 - 7.1.1            -99990000.000 -384206718 184226718 0.9999958
## 5.0 - 7.1.1-4.1 - 7.1.1          -99999995.000 -384216713 184216723 0.9999958
## 5.0 - 8.0-4.1 - 7.1.1            -90000000.000 -336138898 156138898 0.9999896
## 5.0 and up-4.1 - 7.1.1           -96832409.486 -298000144 104335325 0.9972294
## 5.1 and up-4.1 - 7.1.1           -99708101.818 -305196450 105780246 0.9968200
## 6.0 and up-4.1 - 7.1.1           -99025066.833 -301848947 103798814 0.9964674
## 7.0 - 7.1.1-4.1 - 7.1.1          -99000000.000 -383216718 185216718 0.9999967
## 7.0 and up-4.1 - 7.1.1           -94678342.857 -298028356 108671670 0.9984625
## 7.1 and up-4.1 - 7.1.1           -66316666.667 -298378645 165745312 1.0000000
## 8.0 and up-4.1 - 7.1.1           -99748315.667 -316822420 117325788 0.9987903
## NaN-4.1 - 7.1.1                  -99994500.000 -346133398 146144398 0.9998939
## Varies with device-4.1 - 7.1.1   -60525637.594 -261598682 140547406 0.9999999
## 4.2 and up-4.1 and up             -2156231.863  -13421947   9109483 1.0000000
## 4.3 and up-4.1 and up             -3245589.722  -17426579  10935400 1.0000000
## 4.4 and up-4.1 and up               220684.765   -8008424   8449793 1.0000000
## 4.4W and up-4.1 and up            -5819109.369  -66565479  54927260 1.0000000
## 5.0 - 6.0-4.1 and up              -5811588.460 -206828786 195205609 1.0000000
## 5.0 - 7.1.1-4.1 and up            -5821583.460 -206838781 195195614 1.0000000
## 5.0 - 8.0-4.1 and up               4178411.540 -137994469 146351292 1.0000000
## 5.0 and up-4.1 and up             -2653997.946  -12514429   7206433 1.0000000
## 5.1 and up-4.1 and up             -5529690.278  -48590483  37531102 1.0000000
## 6.0 and up-4.1 and up             -4846655.293  -32528730  22835419 1.0000000
## 7.0 - 7.1.1-4.1 and up            -4821588.460 -205838786 196195609 1.0000000
## 7.0 and up-4.1 and up              -499931.317  -31804862  30804999 1.0000000
## 7.1 and up-4.1 and up             27861744.873  -88248258 143971747 1.0000000
## 8.0 and up-4.1 and up             -5569904.127  -87727907  76588099 1.0000000
## NaN-4.1 and up                    -5816088.460 -147988969 136356792 1.0000000
## Varies with device-4.1 and up     33652773.946   25962535  41343012 0.0000000
## 4.3 and up-4.2 and up             -1089357.859  -18157819  15979103 1.0000000
## 4.4 and up-4.2 and up              2376916.628  -10190904  14944737 1.0000000
## 4.4W and up-4.2 and up            -3662877.506  -65147456  57821701 1.0000000
## 5.0 - 6.0-4.2 and up              -3655356.597 -204896867 197586154 1.0000000
## 5.0 - 7.1.1-4.2 and up            -3665351.597 -204906862 197576159 1.0000000
## 5.0 - 8.0-4.2 and up               6334643.403 -136155215 148824502 1.0000000
## 5.0 and up-4.2 and up              -497766.083  -14189369  13193837 1.0000000
## 5.1 and up-4.2 and up             -3373458.415  -47469532  40722615 1.0000000
## 6.0 and up-4.2 and up             -2690423.430  -31956943  26576096 1.0000000
## 7.0 - 7.1.1-4.2 and up            -2665356.597 -203906867 198576154 1.0000000
## 7.0 and up-4.2 and up              1656300.546  -31058078  34370679 1.0000000
## 7.1 and up-4.2 and up             30017976.737  -86479940 146515893 1.0000000
## 8.0 and up-4.2 and up             -3413672.263  -86118989  79291644 1.0000000
## NaN-4.2 and up                    -3659856.597 -146149715 138830002 1.0000000
## Varies with device-4.2 and up     35809005.809   23587236  48030776 0.0000000
## 4.4 and up-4.3 and up              3466274.487  -11769678  18702227 1.0000000
## 4.4W and up-4.3 and up            -2573519.647  -64658441  59511402 1.0000000
## 5.0 - 6.0-4.3 and up              -2565998.738 -203991741 198859744 1.0000000
## 5.0 - 7.1.1-4.3 and up            -2575993.738 -204001736 198849749 1.0000000
## 5.0 - 8.0-4.3 and up               7424001.262 -135325934 150173936 1.0000000
## 5.0 and up-4.3 and up               591591.776  -15583825  16767009 1.0000000
## 5.1 and up-4.3 and up             -2284100.556  -47213466  42645265 1.0000000
## 6.0 and up-4.3 and up             -1601065.571  -32108659  28906528 1.0000000
## 7.0 - 7.1.1-4.3 and up            -1575998.738 -203001741 199849744 1.0000000
## 7.0 and up-4.3 and up              2745658.405  -31083541  36574857 1.0000000
## 7.1 and up-4.3 and up             31107334.596  -85708541 147923210 1.0000000
## 8.0 and up-4.3 and up             -2324314.404  -85476906  80828277 1.0000000
## NaN-4.3 and up                    -2570498.738 -145320434 140179436 1.0000000
## Varies with device-4.3 and up     36898363.669   21946582  51850146 0.0000000
## 4.4W and up-4.4 and up            -6039794.134  -67041066  54961478 1.0000000
## 5.0 - 6.0-4.4 and up              -6032273.225 -207126648 195062101 1.0000000
## 5.0 - 7.1.1-4.4 and up            -6042268.225 -207136643 195052106 1.0000000
## 5.0 - 8.0-4.4 and up               3957726.775 -138324253 146239706 1.0000000
## 5.0 and up-4.4 and up             -2874682.711  -14199959   8450594 1.0000000
## 5.1 and up-4.4 and up             -5750375.043  -49170021  37669270 1.0000000
## 6.0 and up-4.4 and up             -5067340.058  -33304391  23169711 1.0000000
## 7.0 - 7.1.1-4.4 and up            -5042273.225 -206136648 196052101 1.0000000
## 7.0 and up-4.4 and up              -720616.082  -32517353  31076121 1.0000000
## 7.1 and up-4.4 and up             27641060.108  -88602505 143884625 1.0000000
## 8.0 and up-4.4 and up             -5790588.892  -88137242  76556064 1.0000000
## NaN-4.4 and up                    -6036773.225 -148318753 136245206 1.0000000
## Varies with device-4.4 and up     33432089.181   23936114  42928065 0.0000000
## 5.0 - 6.0-4.4W and up                 7520.909 -209900437 209915479 1.0000000
## 5.0 - 7.1.1-4.4W and up              -2474.091 -209910432 209905484 1.0000000
## 5.0 - 8.0-4.4W and up              9997520.909 -144490553 164485595 1.0000000
## 5.0 and up-4.4W and up             3165111.423  -58077562  64407785 1.0000000
## 5.1 and up-4.4W and up              289419.091  -73924251  74503089 1.0000000
## 6.0 and up-4.4W and up              972454.076  -65508628  67453536 1.0000000
## 7.0 - 7.1.1-4.4W and up             997520.909 -208910437 210905479 1.0000000
## 7.0 and up-4.4W and up             5319178.052  -62750167  73388523 1.0000000
## 7.1 and up-4.4W and up            33680854.242  -97219756 164581464 1.0000000
## 8.0 and up-4.4W and up              249205.242 -101747728 102246139 1.0000000
## NaN-4.4W and up                       3020.909 -154485053 154491095 1.0000000
## Varies with device-4.4W and up    39471883.315  -21459035 100402801 0.8508697
## 5.0 - 7.1.1-5.0 - 6.0                -9995.000 -284226713 284206723 1.0000000
## 5.0 - 8.0-5.0 - 6.0                9990000.000 -236148898 256128898 1.0000000
## 5.0 and up-5.0 - 6.0               3157590.514 -198010144 204325325 1.0000000
## 5.1 and up-5.0 - 6.0                281898.182 -205206450 205770246 1.0000000
## 6.0 and up-5.0 - 6.0                964933.167 -201858947 203788814 1.0000000
## 7.0 - 7.1.1-5.0 - 6.0               990000.000 -283226718 285206718 1.0000000
## 7.0 and up-5.0 - 6.0               5311657.143 -198038356 208661670 1.0000000
## 7.1 and up-5.0 - 6.0              33673333.333 -198388645 265735312 1.0000000
## 8.0 and up-5.0 - 6.0                241684.333 -216832420 217315788 1.0000000
## NaN-5.0 - 6.0                        -4500.000 -246143398 246134398 1.0000000
## Varies with device-5.0 - 6.0      39464362.406 -161608682 240537406 1.0000000
## 5.0 - 8.0-5.0 - 7.1.1              9999995.000 -236138903 256138893 1.0000000
## 5.0 and up-5.0 - 7.1.1             3167585.514 -198000149 204335320 1.0000000
## 5.1 and up-5.0 - 7.1.1              291893.182 -205196455 205780241 1.0000000
## 6.0 and up-5.0 - 7.1.1              974928.167 -201848952 203798809 1.0000000
## 7.0 - 7.1.1-5.0 - 7.1.1             999995.000 -283216723 285216713 1.0000000
## 7.0 and up-5.0 - 7.1.1             5321652.143 -198028361 208671665 1.0000000
## 7.1 and up-5.0 - 7.1.1            33683328.333 -198378650 265745307 1.0000000
## 8.0 and up-5.0 - 7.1.1              251679.333 -216822425 217325783 1.0000000
## NaN-5.0 - 7.1.1                       5495.000 -246133403 246144393 1.0000000
## Varies with device-5.0 - 7.1.1    39474357.406 -161598687 240547401 1.0000000
## 5.0 and up-5.0 - 8.0              -6832409.486 -149218053 135553234 1.0000000
## 5.1 and up-5.0 - 8.0              -9708101.818 -158135442 138719238 1.0000000
## 6.0 and up-5.0 - 8.0              -9025066.833 -153741136 135691003 1.0000000
## 7.0 - 7.1.1-5.0 - 8.0             -9000000.000 -255138898 237138898 1.0000000
## 7.0 and up-5.0 - 8.0              -4678342.857 -150130886 140774200 1.0000000
## 7.1 and up-5.0 - 8.0              23683333.333 -159777769 207144436 1.0000000
## 8.0 and up-5.0 - 8.0              -9748315.667 -173840914 154344283 1.0000000
## NaN-5.0 - 8.0                     -9994500.000 -210966069 190977069 1.0000000
## Varies with device-5.0 - 8.0      29474362.406 -112777468 171726193 1.0000000
## 5.1 and up-5.0 and up             -2875692.332  -46633840  40882455 1.0000000
## 6.0 and up-5.0 and up             -2192657.347  -30947499  26562184 1.0000000
## 7.0 - 7.1.1-5.0 and up            -2167590.514 -203335325 199000144 1.0000000
## 7.0 and up-5.0 and up              2154066.629  -30103372  34411505 1.0000000
## 7.1 and up-5.0 and up             30515742.820  -85854685 146886170 1.0000000
## 8.0 and up-5.0 and up             -2915906.180  -85441545  79609733 1.0000000
## NaN-5.0 and up                    -3162090.514 -145547734 139223553 1.0000000
## Varies with device-5.0 and up     36306771.892   25366780  47246764 0.0000000
## 6.0 and up-5.1 and up               683034.985  -50148497  51514567 1.0000000
## 7.0 - 7.1.1-5.1 and up              708101.818 -204780246 206196450 1.0000000
## 7.0 and up-5.1 and up              5029758.961  -47862075  57921593 1.0000000
## 7.1 and up-5.1 and up             33391435.152  -90298015 157080885 1.0000000
## 8.0 and up-5.1 and up               -40213.848  -92600923  92520495 1.0000000
## NaN-5.1 and up                     -286398.182 -148713738 148140942 1.0000000
## Varies with device-5.1 and up     39182464.224   -4138283  82503212 0.1558725
## 7.0 - 7.1.1-6.0 and up               25066.833 -202798814 202848947 1.0000000
## 7.0 and up-6.0 and up              4346723.976  -37000724  45694172 1.0000000
## 7.1 and up-6.0 and up             32708400.167  -86502109 151918909 1.0000000
## 8.0 and up-6.0 and up              -723248.833  -87207642  85761144 1.0000000
## NaN-6.0 and up                     -969433.167 -145685503 143746636 1.0000000
## Varies with device-6.0 and up     38499429.239   10414690  66584168 0.0000952
## 7.0 and up-7.0 - 7.1.1             4321657.143 -199028356 207671670 1.0000000
## 7.1 and up-7.0 - 7.1.1            32683333.333 -199378645 264745312 1.0000000
## 8.0 and up-7.0 - 7.1.1             -748315.667 -217822420 216325788 1.0000000
## NaN-7.0 - 7.1.1                    -994500.000 -247133398 245144398 1.0000000
## Varies with device-7.0 - 7.1.1    38474362.406 -162598682 239547406 1.0000000
## 7.1 and up-7.0 and up             28361676.190  -91741808 148465161 1.0000000
## 8.0 and up-7.0 and up             -5069972.810  -92781156  82641211 1.0000000
## NaN-7.0 and up                    -5316157.143 -150768700 140136386 1.0000000
## Varies with device-7.0 and up     34152705.263    2491151  65814259 0.0165970
## 8.0 and up-7.1 and up            -33431649.000 -175540008 108676710 1.0000000
## NaN-7.1 and up                   -33677833.333 -217138936 149783269 1.0000000
## Varies with device-7.1 and up      5791029.073 -110415632 121997690 1.0000000
## NaN-8.0 and up                     -246184.333 -164338783 163846414 1.0000000
## Varies with device-8.0 and up     39222678.073  -43071871 121517227 0.9976678
## Varies with device-NaN            39468862.406 -102782968 181720693 1.0000000

ANOVA test for Content Rating vs Installs

# Filter out rows where Installs is 0 or NA and where Content.Rating is NA
data_for_anova <- data_final %>%
  filter(!is.na(Content.Rating), !is.na(Installs), Installs > 0)

# Re-run the ANOVA test
install_anova <- aov(log10(Installs) ~ Content.Rating, data = data_for_anova)

# Display the results
print("\nANOVA test results for Installs by Content Rating:")
## [1] "\nANOVA test results for Installs by Content Rating:"
print(summary(install_anova))
##                  Df Sum Sq Mean Sq F value Pr(>F)    
## Content.Rating    5    743  148.68   41.95 <2e-16 ***
## Residuals      9638  34160    3.54                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ANOVA analysis : Revealed significant differences in install counts based on content rating (F(5, 9638) = 41.95, p < 2e-16). This indicates that various content ratings have a substantial impact on the number of installs, highlighting the importance of content quality and type in attracting users.

Chi-square Test for Content Rating vs Last Updated

# 3.2 Statistical Tests

# Chi-square test for independence
contingency_table <- table(data_time_analysis$Content.Rating, data_time_analysis$update_quarter)

# Perform the Chi-square test
chi_test <- chisq.test(contingency_table)
print("\nChi-square test for independence between Content Rating and Update Quarter:")
## [1] "\nChi-square test for independence between Content Rating and Update Quarter:"
print(chi_test)
## 
##  Pearson's Chi-squared test
## 
## data:  contingency_table
## X-squared = 88.112, df = 15, p-value = 2.229e-12

The P value is small signifying that there is statistically significant relationship between Content Rating and Last Updated quarter

Implications These findings suggest that regular updates are important for sustaining app installs, and that different content ratings can influence user engagement. Strategies aimed at timely updates and optimizing content ratings could enhance app performance and user acquisition.

Correlation

Correlation for all variables in data_final

Lets convert all the categorical variables into factors and then convert into numerical dataframe for calucalting the correlation matrix

# Step 1: Create a copy of the original data without specific columns
columns_to_remove <- c("App", "Scaled_Reviews", "update_year", "update_month", 
                        "update_quarter", "days_since_update", "week_of_year", "Last.Updated","day_of_week","month_of_year","season")

data_numeric_or_factor <- data_final %>%
  select(-one_of(columns_to_remove))



# Step 2: Convert specified categorical columns to factors

data_factor <- data_numeric_or_factor

# Step 3: Identify categorical columns
categorical_columns <- sapply(data_numeric_or_factor, is.factor)

# Step 4: Convert each categorical variable to numeric
data_final_numeric <- data_numeric_or_factor  # Copy of the data
data_final_numeric[categorical_columns] <- lapply(data_numeric_or_factor[categorical_columns], 
                                                   function(x) as.numeric(as.factor(x)))


# Step 5: Calculate Pearson correlation
correlation_matrix <- cor(data_final_numeric,method = "pearson", use = "complete.obs")
print(correlation_matrix)
##                   Category      Rating      Reviews        Size     Installs
## Category        1.00000000 -0.03748571  0.017299610 -0.12509260  0.031658924
## Rating         -0.03748571  1.00000000  0.055012062  0.05624989  0.040068378
## Reviews         0.01729961  0.05501206  1.000000000  0.07552366  0.625164525
## Size           -0.12509260  0.05624989  0.075523661  1.00000000  0.040740457
## Installs        0.03165892  0.04006838  0.625164525  0.04074046  1.000000000
## Price          -0.01376932 -0.01953409 -0.007597749 -0.02157185 -0.009405171
## Content.Rating -0.09387557  0.02591249  0.055620981  0.18280324  0.049807143
## Android.Ver     0.09094536  0.05804513  0.106335063  0.07354916  0.158737503
## log_Installs    0.05879716  0.06610220  0.207743211  0.25750081  0.263707724
##                       Price Content.Rating  Android.Ver log_Installs
## Category       -0.013769321   -0.093875566  0.090945364   0.05879716
## Rating         -0.019534090    0.025912489  0.058045132   0.06610220
## Reviews        -0.007597749    0.055620981  0.106335063   0.20774321
## Size           -0.021571845    0.182803242  0.073549158   0.25750081
## Installs       -0.009405171    0.049807143  0.158737503   0.26370772
## Price           1.000000000   -0.014487940 -0.009672448  -0.05623643
## Content.Rating -0.014487940    1.000000000 -0.003608702   0.12136585
## Android.Ver    -0.009672448   -0.003608702  1.000000000   0.23854322
## log_Installs   -0.056236427    0.121365850  0.238543220   1.00000000
# Caluclate the spearman 
correlation_matrix1 <- cor(data_final_numeric, method = "spearman", use = "complete.obs")
print(correlation_matrix1)
##                   Category       Rating     Reviews        Size    Installs
## Category        1.00000000 -0.023074576  0.05850162 -0.11449196  0.06612057
## Rating         -0.02307458  1.000000000  0.20073076  0.07385874  0.11994456
## Reviews         0.05850162  0.200730765  1.00000000  0.33110583  0.96770658
## Size           -0.11449196  0.073858737  0.33110583  1.00000000  0.31032364
## Installs        0.06612057  0.119944559  0.96770658  0.31032364  1.00000000
## Price           0.01311436  0.053828940 -0.15071272 -0.04391102 -0.23202906
## Content.Rating -0.10818522  0.006106387  0.16509252  0.19593634  0.13938983
## Android.Ver     0.09009285  0.079794210  0.19115306  0.24624117  0.19460722
## log_Installs    0.06612057  0.119944559  0.96770658  0.31032364  1.00000000
##                      Price Content.Rating  Android.Ver log_Installs
## Category        0.01311436   -0.108185222  0.090092845   0.06612057
## Rating          0.05382894    0.006106387  0.079794210   0.11994456
## Reviews        -0.15071272    0.165092522  0.191153061   0.96770658
## Size           -0.04391102    0.195936335  0.246241166   0.31032364
## Installs       -0.23202906    0.139389834  0.194607223   1.00000000
## Price           1.00000000   -0.037143541 -0.099022432  -0.23202906
## Content.Rating -0.03714354    1.000000000 -0.005692711   0.13938983
## Android.Ver    -0.09902243   -0.005692711  1.000000000   0.19460722
## log_Installs   -0.23202906    0.139389834  0.194607223   1.00000000
# Step 6: Plot the correlation matrix
corrplot(correlation_matrix, method = "color", addCoef.col = "black")

As seen installs has the highest correlation with the reviews.

As we can see from the both pearson and spearman have relatively different correlation matrices and plots. We can refer to the categorical variables correlation from the spearman.

Correlation Reviews

reviews_correlation_factor <- correlation_matrix[, "Reviews", drop = FALSE]

reviews_correlation_factor1 <- correlation_matrix1[, "Reviews", drop = FALSE]

# Print the correlation matrix for Reviews from numeric factor data
print(reviews_correlation_factor)
##                     Reviews
## Category        0.017299610
## Rating          0.055012062
## Reviews         1.000000000
## Size            0.075523661
## Installs        0.625164525
## Price          -0.007597749
## Content.Rating  0.055620981
## Android.Ver     0.106335063
## log_Installs    0.207743211
# Step 6: Create a correlation plot for Reviews in data_numeric_or_factor
corrplot(reviews_correlation_factor, method = "color", addCoef.col = "black", 
         title = "Correlation of Reviews with Other Variables (Factor Data)", 
         tl.col = "black", tl.srt = 45)

corrplot(reviews_correlation_factor1, method = "color", addCoef.col = "black", 
         title = "Correlation of Reviews with Other Variables (Factor Data)", 
         tl.col = "black", tl.srt = 45)

As seen reviews has the highest correlation(positive) with the installs and then in spearman correlation matrix it has high correlation(positive) with content rating and android version meaning

Correlation with Rating

rating_correlation_factor <- correlation_matrix[, "Rating", drop = FALSE]

rating_correlation_factor1 <- correlation_matrix1[, "Rating", drop = FALSE]

# Print the correlation matrix for Reviews from numeric factor data
print(rating_correlation_factor)
##                     Rating
## Category       -0.03748571
## Rating          1.00000000
## Reviews         0.05501206
## Size            0.05624989
## Installs        0.04006838
## Price          -0.01953409
## Content.Rating  0.02591249
## Android.Ver     0.05804513
## log_Installs    0.06610220
# Step 6: Create a correlation plot for Reviews in data_numeric_or_factor
corrplot(rating_correlation_factor, method = "color", addCoef.col = "black", 
         title = "Correlation of Reviews with Other Variables (Factor Data)", 
         tl.col = "black", tl.srt = 45)

corrplot(rating_correlation_factor1, method = "color", addCoef.col = "black", 
         title = "Correlation of Reviews with Other Variables (Factor Data)", 
         tl.col = "black", tl.srt = 45)

Rating is not much correlated with any of the variables, only slightly positively correlated with reviews and installs which was also demonstrated through visualisation previously.

Correlation with Price

# correlation for Price
price_correlation_factor1 <- correlation_matrix1[, "Price", drop = FALSE]
print("Spearman Correlation of Price with Other Variables:")
## [1] "Spearman Correlation of Price with Other Variables:"
print(price_correlation_factor1)
##                      Price
## Category        0.01311436
## Rating          0.05382894
## Reviews        -0.15071272
## Size           -0.04391102
## Installs       -0.23202906
## Price           1.00000000
## Content.Rating -0.03714354
## Android.Ver    -0.09902243
## log_Installs   -0.23202906
# Plot for correlation with Price
corrplot(price_correlation_factor1, method = "color", addCoef.col = "black", 
         title = "Correlation of Price with Other Variables (Spearman)", 
         tl.col = "black", tl.srt = 45)

Price vs. Log_Installs: -0.06, suggesting a very weak negative relationship between price and the number of installs.

Correlation with Category

# Correlation for Category
# convert the category variable into a suitable format if necessary
category_correlation_factor1 <- correlation_matrix1[, "Category", drop = FALSE]
print("Spearman Correlation of Category with Other Variables:")
## [1] "Spearman Correlation of Category with Other Variables:"
print(category_correlation_factor1)
##                   Category
## Category        1.00000000
## Rating         -0.02307458
## Reviews         0.05850162
## Size           -0.11449196
## Installs        0.06612057
## Price           0.01311436
## Content.Rating -0.10818522
## Android.Ver     0.09009285
## log_Installs    0.06612057
# Plot for correlation with Category
corrplot(category_correlation_factor1, method = "color", addCoef.col = "black", 
         title = "Correlation of Category with Other Variables (Spearman)", 
         tl.col = "black", tl.srt = 45)

# Correlation for Android Version
# Make sure to convert the category variable into a suitable format if necessary
version_correlation_factor1 <- correlation_matrix1[, "Android.Ver", drop = FALSE]
print("Spearman Correlation of Category with Other Variables:")
## [1] "Spearman Correlation of Category with Other Variables:"
print(category_correlation_factor1)
##                   Category
## Category        1.00000000
## Rating         -0.02307458
## Reviews         0.05850162
## Size           -0.11449196
## Installs        0.06612057
## Price           0.01311436
## Content.Rating -0.10818522
## Android.Ver     0.09009285
## log_Installs    0.06612057
# Plot for correlation with Category
corrplot(version_correlation_factor1, method = "color", addCoef.col = "black", 
         title = "Correlation of Category with Other Variables (Spearman)", 
         tl.col = "black", tl.srt = 45)

Correlation between time analysis variables VS Installs

# Create a new data frame with relevant variables for correlation analysis
correlation_data <- data_time_analysis %>%
  select(days_since_update, update_year, update_month) %>%
  mutate(log_installs = log10(data_final$Installs))

# Calculate the correlation matrix
correlation_matrix <- cor(correlation_data, method = "spearman", use = "complete.obs")

# Print the correlation matrix
print("Spearman Correlation Matrix:")
## [1] "Spearman Correlation Matrix:"
corrplot(correlation_matrix, method = "color", 
          col = colorRampPalette(c("red", "white", "blue"))(200),
          type = "upper", 
          tl.col = "black", tl.srt = 45, 
          addCoef.col = "black", # Add correlation coefficients
          number.cex = 0.7,      # Adjust size of numbers
          title = "Correlation Matrix", # Title
          mar = c(0, 0, 1, 0))   # Margins

Correlation Analysis: A moderate negative correlation :(ρ=−0.3317) was found between the number of days since the last update and the log-transformed installs. This indicates that as the time since the last update increases, the number of installs tends to decrease. The relationship is statistically significant (p < 2.2e-16), suggesting that timely updates may be crucial for maintaining user engagement.

##correlation between Installs Vs Size

# Calculate Pearson correlation and perform the test
cor_test <- cor.test(data_clean$Size, data_clean$Installs, method = "pearson")

# Output the correlation coefficient and p-value
cor_test
## 
##  Pearson's product-moment correlation
## 
## data:  data_clean$Size and data_clean$Installs
## t = 4.0069, df = 9657, p-value = 6.198e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.02081430 0.06063426
## sample estimates:
##        cor 
## 0.04074046

According to the relational hypothesis testing: 1. Correlation Coefficient (cor):Pearson correlation coefficient is 0.0407. This indicates a very weak positive relationship between Size and Installs—meaning that as app size increases, installs slightly tend to increase as well, but the effect is minimal.

P-values: The p-value is 6.198e-05 (or 0.00006198), which is much smaller than the conventional significance level (e.g., 0.05). This low p-value means that we can reject the null hypothesis (that there is no correlation) and conclude that x and y are not independent.

Confidence Interval: The 95% confidence interval for the correlation coefficient is between 0.0208 and 0.0606. This range is quite narrow and close to zero, further confirming that while the relationship is significant, the strength of the correlation is very low.